fcFOSR-TR-  -  4  -  1  1 5  9 

Al  DS 


rinn!  Report 

TN1-I01  1-1 

DKSIC.N  OF  A  PICTORIAL  PROGRAM  RFFFRFNc M  LANGUAGK 


00 

CM 

00 

O) 


< 

i 

O 

< 


Frlc  A-  Dcmesliek 
Jeffrey  S.  Dean 
Susan  G.  Rosenbaum 
Bilan  P.  McCune 


Advanced  Information  &  Decision  Systems 
201  San  Antonio  Circle,  Su  te  280 
Mountain  Mew.  CA  <)  U)-10  !  27'! 


August.  i  U8-I 


Final  Technical  Report  for  June  1.  .5)81  :*.o  May,  ?9S1 


Approved  for  public  release;  distribution  u  ''.limited 


Prepared  for: 


United  States  Air  Force 
All  Force  Cilice  o:*  Sclent i!!c  Rns.-arcp 
Building  -t  10 

Boiling  Air  Force  Rase.  D.f  pnpt.’ 


n 


s 

ELECT £ 


OEC  3  1 1984 


The  view-;,  opinions,  nml/or  Undines  co’ii  nine:!  ill  ■  hit  !i  llm-,  ol'  the 

aut  In  .rs.'sl  und  should  not  tv  coind  rued  as  oi  •  •  M  i » 1  s :  1 1  P  -p  • niei.t  •  *f  tin1  Air  force 
P'i:--P1-mi.  pt  Hoy,  c.r  deelslon,  unles-  .p-H'.ivd  et]  l.v  •  <•.  ii.-i  r  1 1  i  ■  ■  • :  <  ivum.-niail  mi 


ADVANCED  INFORMATION  &  DECISION  SYSTEMS 

Mountain  View,  CA  94040 


12  18  01 


SECURITY  CLASSIf'CtTlQN  Of  THIS  PAGE 


1,  REFORT  SECURlTY  CLASSIFICATION 

UNCLASSIFIED 


2*  SECURITY  CLASSIFICATION  AUTHORITY 


2t>  .DECLASSIFICATION  DOWNGRADING  SCHEDULE 


4  Ft  ^FORMING  ORGANIZATION  REPORT  NUMBERiSl 

TV.-10I4-4 


REPORT  DOCUMENTATION  PAGE 


lb  Restrictive  markings 


3  O' ST  RIBUTION/ AVAILABILITY  OF  REPORT 

Approved  for  public  release;  distribution 
unlimited. 


5  MONITORING  ORGANIZATION  REPORT  NUMBERlSi 

AFOSr.  TR-  r  4  -  1  15  9 


t.a  NAME  OF  PERFORMING  ORGANIZATION  Eo  OFFICE  SYMBOL  7«  NAME  OF  MONITORING  ORGANIZATION 


Advanced  Information  and 
Decision  Systems 


6c  ADDRESS  'City  State  and  SIP  Cude. 

201  San  Antonio  Circle,  Suite  286, 
Mountain  View  CA  94040-127,0 


tif  applicable > 


Air  Force  Office  of  Scientific  Research 


7b  AOORESS  i City .  Slat #  and  ZIP  Code • 

Directorate  of  Mathematical  and  Information 
Sciences,  Bolling  AFB  DC  20332 


S«  NAME  OF  FUNOING/SPONSOR'NG 
ORGANIZATION 

A?  03?. 


8c  ADDRESS  'City,  Stale  and  ZIP  Coae> 


8b  OFFICE  SVMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
itf  applicable i 

NM  F49620-81-C-0067 


10  SOURCE  Of  FUNDING  NOS 


PROGRAM 
ELEMENT  NO 

61102F 


PROJECT 

NO 


WOR<  UNIT 
NO 


Beilins  AFB  DC  20332 


ti  TiTlE  Inciuae  security  Clarification,  * 

DESIGN  OF  A  PICTORIAL  PROGRAM  REFERENCE  LANGUAGE 

I  ' 


i2  PERSONAL  authORiS«  _ _ _ . _ _ 

Eric  A.  Domeshek,  Jeffrey  S.  Dean,  Susan  G.  Rosenbaum,  and  Brian  P.  McCune. 


13a  type  OF  REPORT  13b  time  COVERED  14  DATE  OF  REPORT  .  Yr .  Mo  .  Da-, ,  15  PAGE  COUNT 

Final  frcmI/6/83  t031/5/84  31  AUG  84  70 


18  SUBjECT  TERMS  <  Con  (in  a#  on  reie-te  if  ncceuar*  and  taentify  by  block  number 

Program  Reference  Language  (PRL ) ;  Extended  Program  Model 
(EPM) ;  Into1 ligent  Program.  Editor  (IPE);  program  documenta¬ 
tion;  artificial  intelligence  (AI):  CONTINUED 


19  ABSTRACT  'Continue  on  reverie  if  neceuary  ana  identify  by  block  number. 


'This  report  covers  the  work  done  during  the  third  year  of  the  Program  Reference  Language 
(PRL)  project.  During  this  year  we  focused  on  the  problem  of  developing  an  adequate  means 
to  expreso  the  types  of  queries  we  had  earlier  identified  as  within  the  province  of  the 
PRL.  Thus,  we  studied  both  the  structure  of  the  actual  query  language  and  the  design  of 
the  user  interface.  The  query  language  on  which  we  concentrated  was  a  pictorial  interface 
designated  the  PRL  Pictorial  Language  (PRL/PL).  The  essential  idea  is  that  the  users 
build  templates  to  sketch  out  what  an  item  that  satisfied  the  query  would  look  like.  For 
programs,  this  means  specifying  an  arrangement  of  standard  program  fragments  that 
characterize  the  desired  part  of  the  program.  This  document  will  discuss  the  design  and 
use  of  this  pictorial  language.  ,, 


[20  Distribution  a  *  ailabi lit  v  of  abstract 


I  ;NCwASS<f 'EO-UNw-MiTg  O  *1,  SAME  AS  opr  __DTiC  USERS 
,  Ji  NAME  0*  RESPONSIBLE  n:  V'O’JA. 


21  ABSTRACT  SECURlTv  CLASSIFICATION 


VIJCLASS:  TIED 


22b  ^ELEP^ONt  NUMBER 

■lnr,u  J»’  4  TO  t  .lOf 


22c  OF  f  -CE  S  - 


^Dr^_Robex^,J^^Buchal 

GO  FORM  1473.  83  APR 


EDlT'CN  OF  1  .AN  73  S  OBSOLETE 


SE  CwR'T.  CuASSiF  CA* 


SCCunirv  classification  of  this  face 


ITEM  #18,  SUBJECT  TERMS,  CONTINUED:  knowledge  base;  multiple  representations;  protocol 
analysis;  user  modeling;  retrieval  language;  debugging  program  cliches;  program  annotations. 


rv 


"jf  •  r 


r 


k 


ej 


A .  J.  —  - 

JDlrt  * 

Avu* 

i.  t  a  u 

7  Codes 

Avail  and/or 

Dist 

Special 

n  i 

M  f 

1  i  ! 

V 

» 

TABLE  OF  CONTENTS 


'F 


Page 


1.  INTRODUCTION 


1.1  OVERVIEW 

1.2  RESEARCH  OBJECTIVES 

1.3  GUIDE  TO  READING 


2 

3 

4 


2.  PRL  PICTURE  LANGUAGE:  PRL/PL 


5 


-  _\1 
.  ’  .1 


2.1  INTRODUCTION  TO  PRL  PICTURE  LANGUAGE 

2.2  DESCRIPTION  OF  PRL/PL 

3.  FORMAL  QUERY  LANGUAGE:  PRL/FL 

4.  PRL/PL  EDITOR  INTERFACE 

5.  PLANS  FOR  FURTHER  DEVELOPMENT 


5 

6 

29 

32 

34 


5.1  QUESTIONS/ISSUES  34 

5.2  FUTURE  WORK  36 

6.  PERSONNEL  37 

6.1  PERSONNEL  37 

6.2  INTERACTIONS  40 

6.3  PUBLICATIONS  43 

7.  REFERENCES  46 

APPENDIX  A.  1  47 

APPENDIX  B.  48 

APPENDIX  C.  40 

-  '.•■jt-b.V- :••••-  .'ivieion 


P 


1 


¥ 


LIST  OF  FIGURES 


1:  Find  The  F'unctions  That  Contain  Loops 

2:  Find  The  Loops  Contained  in  Functions 

3:  Find  All  Functions  Containing  Loops  And 

If-Statements 

4:  Find  All  Functions  Containing  A  Loop 

Followed  By  An  If-Statement 
5:  Find  All  Functions  Which  Contain  Loops 

That  Contain  If-Statements 
6:  Find  The  Function  Named  Bar 

7:  Find  All  Functions  That  Use  The  Variable  Foo 

8:  Find  All  Functions  Containing  Loops  Or 

If-Statements 

9:  Find  AH  Functions  Containing  Loops  And 

If-Statements 

10:  Find  All  Functions  Containing  Loops  Or 

If-Statements 

11:  Find  The  Functions  That  Do  Not  Contain  Loops 

12:  Find  The  If-Statements  Not  Contained  in  Loops 

13:  Find  The  Functions  Which  Have  A  Loop  Not 

Followed  By  An  If-Statement 
14:  Find  The  Functions  In  Which  All  Loops  Contain 

If-Statements 

15:  Find  The  Functions  Which  Contain  2  Loops  That 

Contain  If-Statements 

16:  Find  The  Functions  Which  Contain  An  If-Statement 

Not  Contained  In  A  Loop 

17:  Find  Those  Functions  That  Use  Some  Variable 

Before  Setting  That  Variable 

18:  Find  Those  Functions  In  Which  A  Variable 

Is  Not  Set  Before  That  Variable  Is  Used 


PACK 


7 

8 
9 

10 

11 

12 

14 

15 

16 

17 

19 

20 
21 

23 

24 


25 


26 


i 


u 


Introduction 


Section  1 


1.  INTRODUCTION 

This  report  documents  the  third  year  of  work  on  the  Program  Reference 
Language  project  (PRL),  which  Is  a  basic  research  effort  aimed  at  the  creation  of 
a  mechanism  for  flexibly  Identifying  the  Interesting  portions  of  programs.  During 
this  year  we  focused  on  the  Issue  of  developing  a  means  of  expressing  the  types  of 
queries  identified  earlier  as  within  the  province  of  the  PRL.  We  studied  both  the 
structure  of  the  actual  query  language  and  the  design  of  the  user  Interface. 

The  PRL  is  designed  to  allow  the  user  to  describe  a  piece  of  a  program  so 
that  an  automated  search  mechanism  can  retrieve  any  matching  program  frag¬ 
ments.  An  extended  PRL  might  also  allow  the  user  to  specify  transformations  to 
be  performed  on  a  selected  set  of  program  fragments,  but  until  we  have  time  to 
develop  and  classify  a  useful  set  of  such  transformations,  we  consider  only  the 
problem  of  specifying  and  performing  searches.  The  work  which  preceded  this 
study  is  discussed  in  length  in  the  annual  reports  for  the  first  and  second  years  of 
research  and  will  be  recapped  only  briefly  below.  (See  “Searching  a  Knowledge 
Rase  of  Programs  and  Documentation,”  [Shapiro-83]  and  “An  Informal  Study  of 
Program  Comprehension,"  [Domeshek-84]  for  more  details.)  This  document 
focuses  on  the  design  of  the  PRL  Picture  Language,  the  pictorial  interface  to  the 
underlying  dat abase 
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1.1  OVERVIEW 

Earlier  PRL  research  led  to  the  definition  or  the  Extended  Program  Model 
fEPM)  [Shapiro-83l.  a  database  which  holds  multiple  representations  of  computer 
programs.  The  particular  forms  in  which  programs  were  to  be  stored,  the  types 
of  analyses  required  to  generate  these  representations,  and  the  information  made 
available  by  these  analyses  were  considered.  Viewing  the  EPM  as  a  database 
leads  naturally  to  a  view  of  the  PRL  as  a  database  query  language.  Our  goals  for 
the  PRL,  combined  with  our  design  for  the  EPM,  guided  us  in  the  design  of  the 
query  language. 

Study  of  existing  database  query  languages  indicated  that  the  task  of 
designing  a  formal  query  language  would  not  be  trivial.  Although  the  definition 
of  a  language  capable  of  expressing  the  required  searches  might  be  straightfor¬ 
ward,  such  a  language  would  not  be  easily  usable  by  programmers.  However, 
such  a  formal  basis  is  essential  for  the  PRL,  if  only  for  Internal  use.  Thus,  while 
some  time  was  spent  studying  the  issues  of  a  formal  query  language,  the  majority 
of  the  effort  was  spent  investigating  the  possibilities  of  two  options  for  more 
“friendly"  query  languages  that  could  be  translated  to  the  formal  language. 

The  first  option  considered  was  a  limited  natural  language  Interface  that 
was  designated  the  PRL  Natural  Language  (PRL/NL).  There  are  already  several 
natural  language  interfaces  designed  for  database  applications,  and  early  PRL 
examples  had  always  phrased  sample  queries  In  tills  manner.  Our  own  experience 
with  English  renderings  of  PRL  queries  caused  us  to  believe  that  they  tended  to 
oiweure  the  regularity  of  the  actual  relationships  being  expressed.  A  survey  of 
tin-  literature  revealed  many  problems  complicating  the  design  of  a  compact  and 
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consistent  subset  of  English  intended  for  use  as  a  query  language. 

The  second  option,  and  the  one  on  which  we  concentrated,  was  a  pictorial 
interface,  designated  the  PRL  Picture  Language  (PRL/PL).  This  approach  was 
originally  inspired  by  the  Query-By-Example  (QBE)  database  query  language 
iZloof-1977);  however,  the  Picture  Language  is  considerably  different  from  QBE. 
In  the  PRL,  obvious  specific  knowledge  about  the  structure  of  the  database  has 
been  incorporated  to  increase  its  power  and  decrease  its  complexity.  The  essen¬ 
tial  idea  behind  QBE  and  PRL/PL  Is  that  a  user  partially  sketches  out  the  pic¬ 
ture  of  the  requested  item.  For  the  PRL,  this  means  specifying  an  arrangement 
of  program  fragments  that  characterizes  the  desired  part  of  the  program. 


1.2  RESEARCH  OBJECTIVES 


Some  of  the  key  research  issues  driving  the  PRL  effort  are: 


1 .  What  are  the  most  useful  ways  of  referring  to  parts  of  a  program?  Said 
in  a  different  way,  what  vocabulary  do  programmers  currently  use  to 
describe  portions  of  their  programs? 

•J.  What  information  must  be  Included  in  a  knowledge  base  about  pro¬ 
grams  and  documentation  in  order  for  it.  to  support  program  search? 

A  What  information  must  be  included  in  such  a  knowledge  base  for  it  to 
support  a  variety  of  Intelligent  tools  for  accessing  and  manipulating 

code? 

f  How  should  Information  of  this  kind  be  represented? 

How  should  application  specific  knowledge  be  Included? 

(i.  How  can  user-supplied  assertions  and  other  documentation  be  acquired 
and  integrated  Into  a  knowledge  base  for  use  In  program  referencing  and 
■  it  her  t  asks? 
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7.  How  can  search  requests  be  expressed  In  a  uniform  reference  language? 

8.  What  form  of  a  search  mechanism  Is  required  to  Implement  these  refer¬ 
ence  requests? 

0.  How  can  these  searches  be  performed  efficiently?  In  what  ways  can 
search  be  limited  or  deferred  in  order  to  maintain  good  response  time? 


1.3  GUIDE  TO  READING 

The  following  sections  provide  details  about  the  PRL.  Section  2  Introduces 
the  PRL  Picture  Language  (PRL/PL)  and  gives  examples  of  its  usage.  Section  3 
discusses  the  PRL  formal  language  (PRL/FL).  Section  4  provides  a  description  of 
the  editor  interface  for  the  PRL/PL;  section  5  discusses  some  of  the  problems  still 
remaining  with  the  PRL/PL  and  PRL/FL  and  our  future  research  plans.  This 
report  concludes  with  discussion  of  key  research  personnel  and  their  activities. 
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2.  PRL  PICTURE  LANGUAGE:  PRL/PL 


2.1  INTRODUCTION  TO  PRL  PICTURE  LANGUAGE 

The  PRL  Picture  Language  provides  a  user  friendly  Interface,  making  it 
easy  for  programmers  to  specify  searches  through  prr  freely  combining 

information  from  the  multiple  representations  of  the  program  maintained  by  the 
Extended  Program  Model,  or  EPM.  There  Is  a  formal  mapping  from  these  pic¬ 
tures  to  the  PRL,  Formal  Language.  An  underlying  operational  semantics  for  the 
Formal  Language  thus  provides  a  formal  means  for  Interpreting  pictures 
expressed  in  the  Picture  Language. 

The  F’RL/L’L  is  similar  to  QBE  in  that  a  query  is  specified  by  describing  a 
typical  item  that  would  satisfy  the  query;  however,  as  the  name  Implies,  the 
interface  is  picture  oriented.  QBE  is  designed  for  relational  databases;  its  tem¬ 
plates  are  partially  fllled-ln  tables  representing  the  various  relations.  Due  to  the 
internal  tree/network  format  underlying  the  EPM  database,  the  PRI,  is  not  well 
suited  to  the  relational  model.  The  templates  for  PRL/PL  queries  reflect  this 
network  structure  in  that  a  major  form  of  composition  is  nesting  of  boxes  inside 
of  her  boxes. 


The  PRL.  PL  is  Intended  to  be  Intuitive  and  easy  to  use  for  typical  simple 


requests;  however,  it  shares  the  problem  found  In  QBE  in  that  complex  requests 
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begin  to  require  complex  templates.  Since  the  spatial  relations  of  pieces  of  the 
PRL/PL  query  tend  to  be  meaningful,  though,  this  problem  should  not  be  as 
severe.  With  the  PRL/PL.  the  query  is  composed  of  boxes  that  form  a  picture  of 
the  overall  shape  of  the  query,  while  in  QBE.  the  user  must  work  with  many 
separate  tables  representing  the  database  relations. 

The  goal  in  designing  a  pictorial  query  language  is  to  ensure  that  any  query 
specifiable  in  the  formal  language  has  at  least  one  pictorial  representation.  The 
PRL/PL  must: 

•  Provide  a  mapping  to  the  formal  query  language 

•  Be  simple  and  Intuitive 

•  Retain  the  integrity  of  pictorial  query  fragments  across  different  contexts, 
which  Implies  a  maintaining  of  composablllty  across  different  envlron- 

m  cuts 


2.2  DESCRIPTION  OF  PRL/PL 

1’h'ture  queries  are  composed  of  boxes  representing  fragments  of  programs. 
The  vocabulary  of  the  PRL/PL  consists  of  all  of  the  program  fragments  under- 
't'" ‘d  by  the  PPM.  Each  fragment  Is  represented  by  a  box.  The  basic  operator 
"f  '•i>m|.,.Mij,>n  L  containment;  if  a  picture  shows  a  box  inside  another  box,  this 
"I  *  1 ' i  1  '  a  mati-h  against  the  database  In  which  the  fragments  of  code  have  the 
~-'tnn  r  i  - 1  •  t  f  i .  i  r  i  -"hip. 
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The  picture  representing  the  query  “Find  the  FUNCTIONS  that  contain 
LOOPS"  is  shown  In  figure  1.  Note  that  there  is  a  box  of  type  function  with  a 
box  of  type  loop  inside  of  it.  Fragments  of  code  which  are  function  definitions 
that  include  loops  will  match  this  search  template. 

FUNCTION 


LOOP 


Figure  1  FIND  THE  FUNCTIONS  THAT  CONTAIN  LOOPS 

A  convention  used  in  PRL/PL  pictures  is  that  the  object  to  be  returned  as 
;i  result  of  the  query  is  shown  in  a  highlighted  box.  Consider  the  alternative 
query  “Find  all  LOOPS  contained  in  FUNCTIONS"  shown  In  figure  2.  Note 


that  the  box  representing  the  loop  is  now  drawn  with  heavier  lines.  This  query 
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will  return  a  set  of  loops  as  Its  result. 

FUNCTION 


LOOP 


Figure  2:  FIND  THE  LOOPS  CONTAINED  IN  FUNCTIONS 

Queries  can  be  much  more  complex  than  these  first  examples.  We  will  sur¬ 
vey  the  variations  and  additional  features  of  the  PRL/PL,  introducing  each  new 
feature  with  an  illustrative  example. 

It  is  possible  to  specify  that  an  object  contain  more  than  one  object.  The 
basi<-  form  of  such  a  compound  containment  Is  illustrated  In  figure  3,  the  picture 
representing  the  <|uery  “Find  all  FUNCTIONS  containing  LOOPS  and  IF- 
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STATEMENTS.”  The  box  representing  the  function  now  contains  two  other 
boxes,  one  representing  a  loop,  the  other  representing  an  if-statement.  The  search 
is  again  intended  to  return  a  set  of  functions,  but  now  only  those  that  contain 
both  a  loop  and  an  if-slatement  are  valid  matches.  Note  that  there  Is  an  Implicit 
conjunction  of  the  Inner  boxes. 


FUNCTION 


Figure  3:  FIND  ALL  FUNCTIONS  CONTAINING  LOOPS 
AND  IF-STATEMENTS 


If  we  wanted  to  perform  the  search  "Find  all  FUNCTIONS  containing  a 


/.OOP  followed  by  an  IF-STATEMENT,"  the  graphical  query  would  look  like 
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figure  i.  Note  the  arrow  drawn  from  the  box  representing  the  loop  to  the  box 
representing  the  if- statement',  this  is  the  pictorial  representation  of  precedence,  or 
textual  ordering.  It  may  be  possible  to  extend  this  idea  to  represent  flow  order¬ 
ing,  as  this  also  might  be  a  useful  search  constraint  and  the  Information  can  be 
generated  from  the  EPM.  Precedence  relations  are  restricted  to  apply  to  objects 
taking  part  in  a  conjunction.  Note  that  without  the  arrow,  figure  4  becomes 
identical  to  figure  3. 


FUNCTION 

LOOP 


Figure  t:  FIND  ALL  FI  N<  TlONS  CONTAINING  A  LOOP  FOLLOWED 

ID'  AN  IF-STATEMKNT 
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The  nesting  of  boxes  within  boxes  is  not  limited  to  a  single  level.  For 
example,  the  query.  ‘‘Find  all  FUNCTIONS  which  contain  LOOPS  that  contain 
IF-STATKMENTS"  is  illustrated  in  figure  5.  Again,  multiple  objects  can  be  con¬ 
tained  and  arbitrary  precedence  relations  can  be  specified  at  any  level  of  nesting. 

_ _  FUNCTION 


LOOP 


I F-STATEMENT 


Figure  r>:  FIND  ALL  FUNCTIONS  WHICH  CONTAIN  LOOPS 
THAT  CONTAIN  IF-STATEMENTS 

Objects  stored  in  the  10PM  have  an  explicit  structure.  For  example,  a  func¬ 
tion  is  composed  of  a  name,  parameter-list,  and  body.  Such  named  parts  of  tin 


>b)ect  are  called  slots  and  may  be  used  in  creating  the  pictorial  query.  The 
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query  "Find  the  FUNCTION  named  BAR”  is  shown  in  figure  6  as  an  example. 
In  specifying  the  picture,  the  user  asked  to  see  the  slot  of  the  function  box.  By 
placing  the  string  “BAR”  in  the  sub-box  labeled  name,  the  user  specifies  that  the 
string  "BAR”  must  be  contained  in  the  name  part  of  the  function.  When  slots 
are  not  used,  the  containment  will  be  considered  satisfied  if  It  occurs  in  any  of 
the  subparts  of  the  outer  object. 


FUNCTION 


Figure  0:  FIND  THE  FUNCTION  NAMED  BAR 
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Additional  relations  are  Introduced  through  the  vocabulary  of  object  types. 
For  example,  figure  7  illustrates  the  specification  of  the  query  “Find  all  FUNC¬ 
TION'S  that  use  the  VARIABLE  FOO."  The  box  labeled  uses  represents  a  data 
flow  object  in  the  EPM  database.  It  has  a  single  slot  which  holds  the  object  that 
is  being  "used."  Simllary,  the  relationship  of  one  function  calling  another  can  be 
represented  by  the  containment  of  a  calls  box,  another  valid  EPM  object,  which 
in  this  ease  is  shown  from  the  control  flow  perspective.  Since  most  important 
information  about  the  program  is  represented  explicitly  in  one  of  the  views  of  the 
EPM.  almost  any  Important  statement  about  a  program  can  be  made  In  terms  of 


containment  of  objects. 
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FUNCTION 


Figure  7:  FIND  ALL  FUNCTIONS  THAT  USE  THE  VARIABLE  FOO 

A  pictorial  query  can  be  composed  of  more  than  one  top  level  box.  All 
separate  subpictures  in  the  query  that  return  some  type  of  object  must  return  the 
same  type  of  object.  The  sets  of  that  object  type  generated  by  the  separate  pic¬ 
tures  are  unioned  together  to  form  a  single  set  as  the  answer  to  the  entire  query. 
Thus  having  separate  top  level  boxes  implies  an  OR  operation,  allowing  the 
results  from  several  variants  of  a  simple  query  to  be  combined.  For  example 
figure  8  shows  the  picture  for  the  query  "Find  all  the  FUNCTIONS  that  contain 
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I  F-STATEMENT 


Figure  8:  FIND  ALL  FUNCTIONS  CONTAINING  LOOPS 
OR  IF-STATEMENTS 

A  special  pair  of  boxes  called  AND  and  OK  boxes  are  used  to  clarify 
representation  of  conjunctions  and  disjunctions.  The  boxes  can  be  placed  any¬ 
where  normal  program  object  boxes  are  legal,  but  they  have  a  special  effect  on 
the  Interpretation  of  the  boxes  they  contain.  Strictly  speaking,  these  boxes  are 
not  essential,  as  any  requests  can  be  constructed  without  them.  However,  they 
are  important  from  a  user  interface  perspective  (l.e.,  they  make  the  PRL/PL 
easier  t'>  u-i  ,  nwi  from  a  mathematical  perspective. 
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An  AND  box  is  satisfied  if  all  the  things  it  contains  are  satisfied.  Figure  9 
illustrates  the  use  of  an  AND  box  to  represent  the  query  "Find  the  FUNCTIONS 
containing  LOOPS  and  IF-STATEMENTS."  Note  that  every  normal  program 
object  box  functions  implicitly  like  an  AND  box  as  illustrated  by  the  equivalence 
of  figure  3  to  figure  9.  The  primary  use  for  AND  boxes  is  where  the  default 
interpretation  would  be  OR,  such  as  within  an  OR  box,  or  at  the  top  level  of  the 
query. 


FUNCTION 

AND 

LOOP 


Figur*  9:  FIND  ALL  FUNCTIONS  CONTAINING  LOOPS 
AND  IF-STATEMENTS 
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An  OR  box  Is  satisfied  If  at  least  one  of  the  things  It  contains  Is  satisfied. 
Figure  10  Illustrates  the  use  of  an  OR  box  to  represent  the  query  "Find  the 
FUNCTIONS  containing  LOOPS  or  IF-STATEMENTS.”  Note  that  In  this  case, 
the  OR  box  is  needed  to  force  the  desired  Interpretation,  rather  than  that  gained 
by  the  default  of  figure  3;  it  should  be  also  noted  that  this  is  the  same  query  as 
that  represented  in  figure  8. 


FUNCTION 


OR 


LOOP 


IF-STATEMENT 


Figure  10:  FIND  ALL  FUNCTIONS  CONTAINING  LOOPS 
OH  IF-STATEMENTS 
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When  constructing  a  search  template  in  the  PRL/PL,  users  may  also 
specify  objects  whose  appearance  would  Invalidate  the  match.  The  notation  for 
negation  is  to  use  a  negative  image  (e.g.,  reverse  video)  in  displaying  the  negated 
box.  In  the  figures  illustrated  in  this  paper,  slashes  will  be  used  to  represent 
reverse  video.  For  example  figure  11  shows  the  picture  representing  the  query, 
'  Find  the  FUNCTIONS  that  do  not  contain  LOOPS.”  It  is  also  possible  to 
negate  parts  of  the  context  for  an  object  which  must  appear.  For  example  figure 
12  shows  the  query,  “Find  the  IF-STATEMENTS  which  are  not  contained  in 
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Figure  12:  FIND  THE  IF-STATEMENTS  NOT  CONTAINED  IN  LOOPS 

When  used  in  combination  with  precedence  arrows,  a  negated  box  will  only 
disqualify  a  potential  matching  code  fragment  if  the  disallowed  object  appears  in 
tiie  s perilled  relation  to  other  objects.  For  example,  figure  13  represents  the 
query  “Find  the  F( 'NOTIONS  that  have  a  LOOP  not  followed  by  an  IF- 
STATLMENT."  A  matching  l''(T.\C'TI()i\  may  contain  IF-STATICMESTS,  as 
fair  as  there  is  a  LOOP  which  textually  follows  them. 


PRL  Picture  Language:  PRL/PL 


Section  2 


FUNCTION 


Figure  13:  FIND  THE  FUNCTIONS  WHICH  HAVE  A  LOOP 
NOT  FOLLOWED  BY  AN  IF-STATEMENT 

When  a  box  representing  a  program  object  Is  Instantiated,  It  Is  a  statement 
that  such  an  object  must  be  present  for  a  candidate  region  of  code  to  be  a  suc¬ 
cessful  match;  all  that  Is  required  is  that  one  such  object  exist.  Thus,  all  objects 
in  PRL/PL  queries  are,  by  default,  assumed  to  be  existentially  quantified. 

There  arc  other  possible  meanings  a  user  might  want  to  express.  The 
PRL/PL  allows  the  explicit  specification  of  either  universal  quantification  or 
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integer  ranges  for  cardinality  constraints  on  program  objects.  The  operation  of 
negation  also  has  an  implicit  effect  on  the  quantification  of  the  negated  terms. 

Figure  14  depicts  the  query,  “Find  the  FUNCTIONS  in  which  all  LOOPS 
contain  IF-STATEMENTS."  Compare  this  with  figure  5;  the  two  differ  only  in 
the  explicit  quantification  on  the  box  representing  LOOP.  The  effect  is  that  the 
mere  existence  of  a  LOOP  containing  an  IF-STATEMENT  is  not  enough  to 
guarantee  that  a  FUNCTION  will  pass  the  test.  If  there  are  other  LOOPS  con¬ 
tained  in  the  FUNCTION  which  do  not  contain  IF-STATEMENTS,  then  the 
FUNCTION  will  fall  to  satisfy  the  specified  condition.  Also  note  that  it  is  not 
required  that  a  FUNCTION  contain  any  LOOPS  to  pass  the  test;  if  it  has  no 
LOOPS ,  then  all  the  LOOPS  it  has  contain  /f-STATEMENTS.  (Or,  to  put  it 
another  way,  there  is  no  LOOP  which  does  not  have  an  IF-STATEMENT.) 
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FUNCTION 


Figure  14:  FIND  THE  FUNCTIONS  IN  WHICH  ALL  LOOPS  CONTAIN 

IF-STATEMENTS 

Figure  15  gives  an  example  of  a  PRL/PL  query  that  contains  a  cardinality 
constraint  A  cardinality  constraint  can  be  any  number  of  natural  numbers  or 
natural  number  ranges.  The  figure  represents  the  query,  "Find  the  FUNCTIONS 
which  contain  2  LOOPS  that  contain  IF-STATEMENTS." 
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FUNCTION 


Figure  15:  FIND  THE  FUNCTIONS  WHICH  CONTAIN  2  LOOPS  THAT 

CONTAIN  IF-STATEMENTS 

Figure  16  shows  an  example  with  a  negated  box.  The  query  means  "Find 
the  FUNCTIONS  which  contain  an  IF-STATEMENT  not  contained  In  a  LOOP." 
Note  that  the  quantification  on  the  negated  LOOP  box  has  effectively  been 
flipped  to  universal.  The  query  means  that  for  all  the  LOOPS  In  the  FUNC¬ 
TION,  there  Is  an  IF-STATEMENT  not  contained  in  any  of  them.  If  the 
quantification  had  not  been  changed,  the  query  would  require  the  existence  of 
some  LOOP  that  did  not  contain  an  IF-STATEMENT,  and  other  LOOPS  in  the 


PRL  Picture  Language:  PRL/PL 


Section  2 


5  FUNCTION  could  still  exist  that  did  contain  an  IF-STATEMENT. 


FUNCTION 


Figure  16:  FIND  THE  FUNCTIONS  WHICH  CONTAIN  AN  IF-STATEMENT 

NOT  CONTAINED  IN  A  LOOP 

The  PRL/PL  can  handle  query  meta-vartables.  The  notation  for  these  vari¬ 
ables  Is  a  character  string  surrounded  by  angle  brackets.  For  example,  <foo>, 
<x>  and  <this-is-a-variable>  are  all  valid  query  meta-varlables.  Such  vari¬ 
ables  may  be  contained  In  a  box  and  are  bound  to  a  set  of  program  objects  of  the 
type  represented  by  that  box.  The  effective  binding  of  a  query  meta-varlable  Is 
the  intersection  of  the  sets  generated  by  Its  several  uses  In  a  query. 
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Meta-variables  are  generally  Introduced  Into  a  query  to  designate  the  same 
object  as  It  appears  In  more  than  one  context.  For  example,  figure  17  Illustrates 
the  query,  "Find  all  FUNCTIONS  that  USE  a  VARIABLE  before  SETTING  that 
VARIABLE."  Note  that  the  meta-variable  <x>  appears  In  two  places  In  the 
query,  and  is  meant  to  refer  to  the  same  object.  In  this  case,  <x>  represents  a 
variable  that  Is  first  used  and  then  set. 

FUNCTION 


Figure  17:  FIND  THOSE  FUNCTIONS  THAT  USE  SOME  VARIABLE  BEFORE 

SETTING  THAT  VARIABLE 
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Actually,  the  query  In  figure  17  does  not  accomplish  what  was  probably 
Intended:  checking  for  any  use  of  a  variable  before  It  receives  an  Initial  value. 
Figure  18  accurately  captures  the  Intended  meaning  and  Illustrates  the  use  of 
negation  In  combination  with  the  precedence  relation.  Figure  18  translates  as 
"Find  the  FUNCTIONS  that  do  not  SET  a  VARIABLE  before  USING  that 
VARIABLE.”  Again  note  that  <x>  Is  used  twice  In  the  query  and  is  Intended 
to  refer  to  the  same  variable  name. 


FUNCTION 


Figure  18:  FIND  THOSE  FUNCTIONS  IN  WHICH  A  VARIABLE  IS  NOT  SET 
BEFORE  THAT  VARIABLE  IS  USED 
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Figure  18  makes  use  of  many  of  the  features  of  the  PRL/PL  and  still 
manages  to  maintain  a  high  degree  of  comprehensibility.  However,  as  Illustrated 
by  the  example  In  figure  17,  PRL/PL  offers  no  protection  against  sloppy  thinking. 
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3.  FORMAL  QUERY  LANGUAGE:  PRL/FL 

The  PRL  Formal  Language  Is  Intended  to  serve  as  an  Internal  form  for  PRL 
queries.  It  should  be  both  unambiguous  and  suitable  for  machine  Interpretation. 
It  Is  possible  that  In  the  future  there  will  be  other  external  forms  of  the  PRL  In 
addition  to  the  Picture  Language.  At  such  time,  the  Formal  Language  would 
serve  as  the  common  underlying  representation.  Since  the  PRL/PL  may  become 
cumbersome  or  counter-intuitive  for  some  complex  requests,  the  Issue  of  a 
Natural  Language  Interface  may  be  considered  In  the  future. 

The  current  syntax  of  the  PRL/FL  Is  presented  In  BNF  form  In  figure  10. 
This  particular  rendition  Is  presented  In  prefix  operator  form  and  makes  use  of 
some  rather  long-winded  keywords;  It  Is  still  under  development. 

The  least  satisfactory  aspect  of  this  specification  for  the  PRL/FL  Is  Its  han¬ 
dling  of  precedence  relations.  The  problem  Is  that  a  precedence  graph  which  Is 
only  constrained  to  be  non-cycllc  Is  transformed  Into  a  set  of  (possibly  nested) 
binary  relations.  This  may  force  some  of  the  terms  to  be  duplicated.  We  would 
like  the  PRL  to  keep  track  of  the  fact  that  such  duplications  are  merely  artifacts 
of  the  translation;  It  should  create  some  sharable  structure  to  represent  the  dupli¬ 
cated  sub-queries. 
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QUERY 

RETURN 

CONTAINMENT 

NEGATION 

DISJUNCTION 

CONJUNCTION 

ORDERING 

CONTAINABLE 

OBJECT 

REPRESENTATION 

QUANTIFICATION 

CARDINALITY 

TYPE 

NIL 

VARIABLE 

STRING 


=  RETURN  I  CONTAINMENT  I  NEGATION 
DISJUNTION  1  CONJUNCTION 

=  (RETURN  CONTAINMENT) 

=  (CONTAINS  OBJECT  CONTAINABLE) 

=  (not  CONTAINMENT) 

=  (or  QUERY*) 

=  (AND  QUERY*  ORDERING*) 

=  (precedes  QUERY  QUERY) 

(precedes  QUERY  ORDERING) 

(PRECEDES  ORDERING  QUERY) 

(PRECEDES  ORDERING  ORDERING) 

=  NIL  I  STRING  I  VARIABLE  i  QUERY 

=  (TYPE  REPRESENTATION  QUANTIZATION) 

=  TEXT  1  SYNTAX  |  VARIABLE  |  QUERY 

=  EXISTENTIAL  |  UNIVERSAL  |  CARDINALITY 

=  A  SET  OF  NONNEGATIVE  INTEGERS  OR  RANGES  OF 
NONNEGATIVE  INTEGERS 

=  ANY  VALID  TYPE  OF  PROGRAM  OBJECT  KNOWN  TO 
THE  EPM  (CONSISTENT  WITH  THE  STATED 
REPRESENTATION 

=  AN  EMPTY  MARKER  FOR  BOXES  WHICH  ARE  NOT 
CONSTRAINED  TO  CONTAIN  ANYTHING 

=  <  STRING  > 

=  OH  COME  ON! 


Figure  19:  PRL/FL:  BNF  SPECIFICATION 
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The  open  Issues  mentioned  earlier  In  the  discussion  of  the  PRL/PL  are 
really  PRL/FL  considerations.  We  know  how  we  want  the  pictures  to  appear  for 
all  cases  of  combinations  of  quantification  and  negation.  We  know  how  they 
should  be  translated  Into  the  Formal  Language.  The  open  questions  have  to  do 
with  the  Interpretation  of  such  queries.  This  Is  the  province  of  the  Interpreter  of 
the  PRL/FL. 

The  mapping  between  the  PRL/PL  and  PRL/FL  is  fairly  straightforward. 
We  already  have  a  prototype  unparser  working  which  can  take  PRL/FL  queries 
and  draw  the  appropriate  pictures.  The  parser,  which  will  map  from  pictures  to 
formal  representation,  will  be  part  of  the  PRL/PL  Editor  Interface. 
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4.  PRL/PL  EDITOR  INTERFACE 

As  important  as  the  PRL/PL  is  the  means  by  which  a  user  enters  and 
manipulates  those  queries.  We  are  currently  developing  a  prototype  Interface  for 
the  language.  The  Interface  is  a  graphical-style  editor,  with  the  characteristics  of 
a  syntax-oriented  editor;  i.e..  It  knows  about  the  PL  and  ensures  that  requests  are 
syntactically  legal. 

Given  such  a  syntax  editor  for  PRL/PL  queries,  the  user  can  at  all  times 
see  the  whole  evolving  query  as  It  Is  being  composed  and  be  certain  that  the 
query  is  syntactically  valid.  The  query  editor  will  take  care  of  such  layout  Issues 
as  scaling  the  boxes  appropriately  and  positioning  the  boxes  when  precedence 
relations  are  specified. 


The  prototype  system  is  being  developed  on  a  Symbolics  3600  Lisp 
Machine,  which  has  a  high-resolution  bit-mapped  display  terminal  and  mouse- 
input  support.  For  each  representation  In  the  EPM,  there  will  be  a  set  of  object 
types  which  may  be  instantiated  as  boxes  and  used  In  queries.  As  an  alternative 
to  typing  in  the  name  of  the  type  of  box  to  be  instantiated,  a  complete  catalog  of 
types  will  be  available  on  a  set  of  mouse-sensitive  menus.  As  a  simple  extension, 
the  user  may  also  maintain  a  library  of  previously  constructed  queries  and 
query-fragments.  The  contents  of  this  library  will  also  be  available  on  a  mouse- 
sensitive  menu.  With  this  facility,  commonly  used  queries  or  pieces  of  queries 
will  always  be  easily  available. 
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Most  of  the  operations  required  to  specify  PRL/PL  queries  will  benefit  from 
the  ability  of  a  mouse  to  quickly  designate  a  position  on  the  screen.  Placement 
of  boxes  In  other  boxes  Is  natural  with  the  mouse  and  easily  transformable  Into 
the  corresponding  PRL/FL  statement.  Similarly,  selecting  a  box  for  negation  or 
as  a  return  object  Is  quickly  accomplished  and  easily  mapped  to  the  formal 
representation.  Specification  of  precedence  relations  works  just  as  smoothly  with 
a  mouse. 

Note  that  only  boxes  representing  program  objects  will  be  displayed 
negated;  AND  and  OR  boxes  when  negated  will  be  transformed  Into  the 
equivalent  positive  statement  through  application  of  DeMorgan’s  Law.  Also  note 
that  boxes  which  have  been  designated  as  return  objects  cannot  be  negated  and 
that  negated  boxes  cannot  be  selected  as  the  query  result. 
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5.  PLANS  FOR  FURTHER  DEVELOPMENT 


5.1  QUESTIONS/ISSUES 

There  are  several  representation  Issues  that  still  must  be  addressed.  One  of 
these  involves  the  meaning  of  containment,  which  can  be  fuzzy  and  may  vary 
with  the  types  of  objects.  While  It  is  clear  what  it  means  for  one  syntactic  struc¬ 
ture  to  contain  another,  it  is  less  clear  when  considering  more  complex  database 
structures.  When  searching  for  structures  containing  a  cliche,  the  object  may  be 
distributed  over  several  parts  of  a  program.  In  this  case,  it  may  make  sense  to 
interpret  containment  only  to  require  some  overlap. 

A  problem  in  the  PRL/PL  Involves  the  representation  of  multiple  objects. 
When  multiple  objects  are  contained  in  some  enclosing  object,  they  can  be  con¬ 
nected  into  an  arbitrary  precedence  graph  by  specifying  directional  arrows 
between  any  two  boxes.  Of  course  it  is  very  easy  to  draw  graphs  that  represent 
unsatlsflable  conditions,  as  shown  in  figure  20.  The  system  should  be  able  to 
detect  such  configurations  and  bring  them  to  the  user’s  attention. 
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FUNCTION 


Figure  20:  AN  UNSATISFIABLE  PRECEDENCE  GRAPH 

We  are  not  satisfied  that  we  have  completely  determined  the  correct 
interpretation  for  all  possible  combinations  of  quantifiers  and  cardinality.  In  par¬ 
ticular,  we  do  not  yet  feel  comfortable  with  the  Interaction  of  these  features  with 
negation.  On  the  whole,  though,  the  default  cases  seem  to  work  well  to  express 
useful  queries.  Improving  our  understanding  of  the  underlying  logic  Is  one  prob¬ 
lem  area  to  which  we  will  be  devoting  more  effort  In  the  coming  year. 
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5.2  FUTURE  WORK 

Part  of  our  effort  during  the  next  year  will  be  aimed  at  addressing  the 
unresolved  Issues  In  the  specification  of  the  PRL/PL  and  the  PRL/FL.  Work  will 
continue  on  the  process  of  translating  search  requests  In  the  PRL/FL  to  search 
requests  In  the  EPM.  In  conjunction  with  this,  we  will  begin  looking  at  the  Issues 
of  efficiently  processing  queries;  part  of  this  effort  will  be  to  refine  the  EPM 
search  methods.  An  additional  task  will  be  to  examine  the  problems  Involved  In 
updating  the  Internal  database  during  a  user’s  editing  session.  We  will  try  to 
develop  a  strategy  which  allows  for  updates  to  the  database  only  when  necessary 
so  that  an  unreasonable  amount  of  time  Is  not  spent  propagating  changes.  We 
will  also  continue  the  study  of  alternate  Interface  forms. 
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6.  PERSONNEL 


6.1  PERSONNEL 

The  Program  Reference  Language  (PRL)  research  project  Is  being  per¬ 
formed  within  the  User  Aids  Program  of  AI&DS,  with  Dr.  Brian  P.  McCune,  Pro¬ 
gram  Manager,  as  Principal  Investigator.  Other  members  of  the  AI&DS  technical 
staff  who  have  contributed  to  the  project  Include  Jeffrey  S.  Dean,  Eric  A. 
Domeshek,  Michael  A.  Brzustowlcz,  Daniel  G.  Shapiro,  and  Susan  G.  Rosenbaum. 

Dr.  Brian  P.  McCune  Is  the  Principal  Investigator  of  the  PRL  project.  He 
received  his  Ph.D.  In  Computer  Science  from  Stanford  University  In  1979;  the 
title  of  his  thesis  was  "Building  Program  Models  Incrementally  from  Informal 
Descriptions."  During  the  past  decade.  Dr.  McCune  has  done  research  in  the 
areas  of  artificial  Intelligence,  software  systems,  and  computer  architecture,  with 
emphasis  on  artificial  Intelligence  approaches  to  software  development  and 
maintenance,  information  retrieval,  database  management,  hypothesis  formation, 
planning,  and  distributed  processing.  He  has  been  the  principal  Investigator  of 
research  projects  to  select  and  design  candidate  AI  tools  for  assisting  in  the 
maintenance  of  Ada  programs  (sponsored  by  Rome  Air  Development  Center),  to 
design  an  Intelligent  program  editor  for  Ada,  to  determine  the  feasibility  of 
automatically  generating  operating  systems,  and  to  design  and  implement  a 
knowledge-based  system  for  textual  Information  retrieval.  Dr.  McCune  is  on  the 
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Editorial  Advisory  Boards  of  Defense  Electronics  and  The  Artificial  Intelligence 
Report.  He  has  been  Invited  to  discuss  the  application  of  artificial  Intelligence  to 
defense  problems  numerous  times,  both  at  workshops  and  In  published  papers. 

Jeffrey  S.  Dean  Is  project  leader  of  the  PRL  project.  He  Is  also  currently 
leading  the  related  Intelligent  Program  Editor  project,  and  was  previously  the 
leader  of  the  AI&DS  Software  Maintenance  Project,  which  defined  advanced  Ada 
tools  for  software  maintenance.  He  received  his  Masters  degree  In  Computer 
Science/Computer  Engineering  from  Stanford  University,  where  he  worked  on  the 
automatic  derivation  of  operating  systems.  His  main  research  Interest  is  the 
application  of  AI  to  software  tools.  He  came  to  AI&DS  In  January  1981  from 
Bell  Telephone  Laboratories,  where  he  was  Involved  In  the  development  and 
maintenance  of  the  UNIX  operating  system  and  Its  utilities. 

Daniel  G.  Shapiro  has  been  contributing  to  the  PRL  project  since  Joining 
AI&DS  In  October  1981,  after  receiving  a  Masters  degree  In  Electrical  Engineer¬ 
ing  and  Computer  Science  from  the  Massachusetts  Institute  of  Technology.  His 
research  interests  Include  artificial  Intelligence,  expert  systems,  and  software 
engineering.  At  AI&DS  he  has  done  work  on  expert  systems  for  program  and 
documentation  editing,  Information  retrieval,  and  mission  planning.  Currently, 
he  is  the  leader  of  the  Battlefield  Commander’s  Assistant  project,  a  basic  research 
effort  aimed  at  developing  the  AI  technology  required  to  assist  battalion  and/or 
brigade  commanders  In  planning  and  evaluating  tactics  for  combat  situations. 
His  masters  thesis,  entitled  "Sniffer:  A  System  that  Understands  Bugs,”  Involved 
the  design  and  Implementation  of  a  semantics- based  debugger  for  the 
Programmer's  Apprentice  project  at  the  MIT  Artificial  Intelligence  Laboratory. 
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He  also  taught  software  engineering  courses  at  MIT. 

Eric  A.  Domeshek  was  responsible  for  much  of  the  PRL  experiment  which 
studied  how  people  think  about  programs  and  has  played  a  key  role  In  the 
development  of  the  PRL  Picture  Language.  Mr.  Domeshek  received  an  A.B.  In 
Physics  from  Harvard  College.  His  course  work  also  emphasised  computer  science 
and  cognitive  science.  His  technical  Interests  are  In  artificial  Intelligence,  particu¬ 
larly  knowledge  representation,  and  computer  graphics. 

Michael  A.  Brzustowlcz  has  been  Involved  with  the  PRL  project  since  Join¬ 
ing  AI&DS  In  November  1983.  He  received  an  S.B.  degree  In  Physics  from  the 
Massachusetts  Institute  of  Technology  In  1979  and  received  his  M.S.E.E.  In  Com¬ 
puter  Engineering  from  Carnegle-Mellon  University  In  1980;  his  thesis  work  was 
entitled  ”A  System  for  the  Implementation  of  Models  of  Reasoning  with  Uncer¬ 
tain  Data."  Mr.  Brzustowlcz's  current  areas  of  Interest  Include  artificial  Intelli¬ 
gence,  software  engineering,  ergonomic  user  Interfaces,  and  computer-aided 
processes.  Prior  to  Joining  AI&DS,  Mr.  Brzustowlcz  worked  for  the  Development 
Systems  Software  Group  of  the  Semiconductor  Division  of  Texas  Instruments, 
and  for  the  Unix  Development  Group  at  Bell  Laboratories. 

Susan  G.  Rosenbaum  has  been  working  with  the  PRL  project  since  joining 
AI&DS  In  June  1984.  Her  areas  of  interest  Include  software  engineering,  artificial 
Intelligence,  and  man-machine  Interfaces.  She  received  a  B.A.  degree  In 
Mathematics  from  the  University  of  Texas  at  Austin  In  1974  and  an  M.S.  degree 
In  C  omputer  Science  from  the  University  of  Texas  at  Arlington  In  1979.  Prior  to 
joining  AI&DS,  slit*  worked  at  Computer*Thought  Corporation  on  the  design  and 
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development  of  a  prototype  tutoring  system  for  the  Ada  language  and  at  Texas 
Instruments  In  the  Computer  Science  Laboratory. 


6.2  INTERACTIONS 

Dr.  Brian  P.  McCune  Is  an  Associate  Editor  of  The  AI  Magazine,  the  publi¬ 
cation  of  the  American  Association  for  Artificial  Intelligence.  He  Is  on  the  Edi¬ 
torial  Advisory  Board  of  Defense  Electronics  and  also  The  Artificial  Intelligence 
Report. 

Dr.  McCune  presented  a  paper  on  an  Intelligent  Program  Editor  at  a 
Software  Engineering  Technology  Review  sponsored  by  the  Navy  (July  1984).  He 
was  an  Invited  speaker  to  COMPSAC  ’83  (November  1983)  and  EASCON  ’83 
(September  1983),  and  was  an  Invited  participant  to  Knowledge  Based  Software 
Assistant  Workshop  at  AAAI-83  (August  1983).  He  attended  the  NAVAIR/ONR 
Aviation  Software  Workshop  (October  1983),  the  DARPA  Formalized  Software 
Development  Workshop  (November  1983),  the  Conference  on  Inference  Theory 
and  AI  (November  1982),  and  the  Software  Maintenance  Workshop  (December 
1983). 


Dr.  McCune  attended  the  Eighth  International  Joint  Conference  on 
Artificial  Intelligence  (IJCAI-83),  held  In  Karlsruhe,  Germany,  In  August  1983 
and  the  National  Conference  on  Artificial  Intelligence  (AAAI-83),  Washington 


D.C.,  August  1983. 
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Dr.  McCune  has  been  Interfacing  heavily  with  both  operational  and 
developmental  commands  In  the  Air  Force  and  elsewhere  In  DoD  and  Industry  In 
order  to  understand  current  and  future  problems  of  software  development  and 
maintenance.  Within  the  Air  Force,  Dr.  McCune  has  met  with  personnel  at  the 
Air  Force  Office  of  Scientific  Research,  Rome  Air  Development  Center,  Wright 
Aeronautical  Laboratories,  Foreign  Technology  Division,  Strategic  Air  Command 
headquarters.  Air  Force  Communications  Computer  Programming  Center,  and 
Air  Force  Satellite  Control  Facility.  Elsewhere  In  DoD  he  has  talked  with  the 
Defense  Intelligence  Agency,  Office  of  the  Undersecretary  of  Defense  for  Research 
and  Engineering,  Defense  Advanced  Research  Projects  Agency,  DoD  STARS  Pro¬ 
gram,  Ada  Joint  Program  Office,  Office  of  Naval  Research,  Naval  Electronics  Sys¬ 
tems  Command,  Naval  Sea  Systems  Command,  Naval  Intelligence  Command, 
Naval  Research  Laboratory,  Naval  Ocean  Systems  Center,  Naval  Intelligence 
Center,  Naval  Weapons  Center,  Army  Research  Office,  Army  Center  for  Tactical 
Computer  Systems,  and  Army  Ballistic  Missile  Defense  Advanced  Technology 
Center. 

Dr.  McCune  has  also  visited  numerous  universities  and  research  centers  to 
assess  the  state  of  the  art  In  automatic  programming  at  first  hand.  Places  visited 
include  Harvard  University,  Massachusetts  Institute  of  Technology,  Carnegle- 
Mellon  University,  Duke  University,  University  of  California  at  Irvine,  and  Stan¬ 
ford  University. 

Jeffrey  S.  Dean  presented  a  paper  on  an  Automated  Tool  for  Software 
Documentation  at  a  Software  Engineering  Technology  Review  sponsored  by  the 
Navy  (July  1081)  He  gave  a  paper  on  a  study  of  software  maintenance  at  the 
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Software  Maintenance  Workshop  (December  1983).  He  attended  the  7th  Interna¬ 
tional  Conference  on  Software  Engineering  (March  1984);  the  Symposium  for 
Application  and  Assessment  of  Automated  Tools  for  Software  Development 
(November  1983)  and  AAAI-83. 

Daniel  G.  Shapiro  was  a  panelist  at  the  ACM  SIGSOFT/SIGPLAN 
Software  Engineering  Symposium  on  High-Level  Debugging,  held  In  Pacific 
Grove,  California,  In  March  1983.  He  presented  papers  on  the  PRL  at  the  IEEE 
Trends  and  Applications  Conference  (May  1983)  and  the  Seventh  International 
Conference  on  Software  Engineering  (March  1984).  He  presented  papers  on  Infor¬ 
mation  retrieval  at  AAAI-83  and  IJCAI-83. 

Eric  A.  Domeshek  attended  the  Symposium  for  Application  and  Assessment 
of  Automated  Tools  for  Software  Development  (November  1983)  and  AAAI-83. 

Michael  A.  Brzustowlcz  attended  the  Symposium  for  Application  and 
Assessment  of  Automated  Tools  for  Software  Development  (November  1983). 

Susan  G.  Rosenbaum  attended  the  ACM  Conference  on  Lisp  and  Func¬ 
tional  Programming  (August  1982)  and  the  National  AdaTEC  Conference 
(October  1983). 
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6.3  PUBLICATIONS 


Members  of  PRL  project  staff  have  published  a  number  of  papers.  A  cumu¬ 
lative  chronological  list  of  publications  appearing  In  technical  Journals  and 
conference  proceedings  Is  listed  below: 

Thomas  L.  Adams,  Andrew  S.  Cromarty,  Brian  P.  McCune,  Gerald  A.  Wilson, 
Milton  R.  Grlnberg,  James  F.  Cunningham,  and  Carl  J.  Toilander,  “A 
Knowledge-Based  System  for  Analyzing  Radar  Systems,"  Invited  paper, 
Proceedings,  Military  Microwaves  ’84,  London,  England,  October  1984. 

Daniel  G.  Shapiro,  Jeffrey  S.  Dean,  and  Brian  P.  McCune,  “A  Knowledge 
Base  for  Supporting  an  Intelligent  Program  Editor,”  7th  International 
Conference  on  Software  Engineering,  March  1984.  (See  Appendix  A.) 

Andrew  S.  Cromarty,  Daniel  G.  Shapiro  and  Michael  R.  Fehllng,  “Still 
Planners  Run  Deep:  Shallow  Reasoning  for  Fast  Replanning,”  Proceedings, 
Society  of  Photo-Optical  Instrumentation  Engineers,  Technical  Symposium 
East,  1984,  to  appear. 

Jeffrey  S.  Dean  and  Brian  P.  McCune,  “An  Informal  Study  of  Software 
Maintenance  Problems,"  Proceedings,  Software  Maintenance  Workshop, 

December  1983.  (See  Appendix  B.) 

Brian  P.  McCune  and  Jeffrey  S.  Dean,  “Trends  for  Advanced  Software 
Tools,"  Defense  Science  2001 +  (reprint  of  EASCON  ’83  paper), 

December  1983. 

Brian  P.  McCune,  Richard  M.  Tong,  Jeffrey  S.  Dean,  and  Daniel  G. 

Shapiro,  “RUBRIC:  A  System  for  Rule-Based  Information  Retrieval,” 

Proceedings,  COMPSAC  1983,  November  1983. 

Brian  P.  McCune  and  Jeffrey  S.  Dean,  "Trends  for  Advanced  Software 
Tools,”  Invited  paper,  Proceedings,  EASCON  ’83,  September  1983. 

(See  Appendix  C.) 

Richard  M.  Tong,  Daniel  G.  Shapiro,  Brian  P.  McCune,  and  Jeffrey  S. 

Dean,  “A  Rule-Based  Approach  to  Information  Retrieval:  Some  Results  and 
Comments,"  Proceedings,  National  Conference  on  Artificial  Intelligence, 
Washington,  D.C.,  August  1983. 

Richard  M.  Tong,  Daniel  G.  Shapiro,  Jeffrey  S.  Dean,  and  Brian  P. 

McCune.  “A  Comparison  of  Uncertainty  Calculi  In  an  Expert  System  for 
Information  Retrieval,”  Eighth  International  Joint  Conference  on 
Artificial  Intelligence,  Karlsruhe,  West  Germany,  August  1983. 

Brian  P.  McCune  and  Robert  J.  Drazovlch,  “Radar  with  Sight  and 
Knowledge,”  Invited  paper,  Defense  Electronics,  August  1983, 
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Richard  M.  Tong  and  Daniel  G.  Shapiro,  “An  Experiment  with  Multiple 
Valued  Logics  in  an  Expert  System,"  Proceedings  of  the  IFAC  Symposium  on 
Fuzzy  Information,  Knowledge  Representation  and  Decision  Analysis, 
Marseille,  France,  July  1983. 

Daniel  G.  Shapiro  and  Brian  P.  McCune,  “The  Intelligent  Program  Editor: 

A  Knowledge-Based  System  for  Supporting  Program  and  Documentation 
Maintenance,"  Proceedings  of  the  Trends  and  Applications  Conference  of 
the  IEEE,  May  1983. 

Gerald  Wilson,  Eric  A.  Domeshek,  Ellen  L.  Drascher,  and  Jeffrey  S.  Dean, 
“The  Multipurpose  Presentation  System,"  Proceedings,  Very  Large  Data 
Base  Conference,  1983. 

Jeffrey  S.  Dean  and  Brian  P.  McCune,  "Advanced  Tools  for  Software 
Maintenance",  Rome  Air  Development  Center,  RADC-TR-82-313,  December 
1982. 

Brian  P.  McCune,  Jeffrey  S.  Dean,  Daniel  G.  Shapiro,  and  Richard  M. 

Tong,  “Rule-Based  Information  Retrieval,"  Workshop  on  Intelligence 
Applications  of  Advanced  Computer  and  Information  Technology:  Focus  on 
Artificial  Intelligence,  Office  of  Research  and  Development,  Office 
of  Scientific  and  Weapons  Research,  Central  Intelligence  Agency, 

Washington,  D.C.,  November  1982. 

Robert  J.  Drazovlch,  Brian  P.  McCune,  and  J.  Roland  Payne,  “Artificial 
Intelligence:  An  Emerging  Military  Technology,”  Invited  paper, 

Conference  Record,  EASCON  ’82:  Fifteenth  Annual  Electronics  and 
Aerospace  Systems  Conference,  Institute  of  Electrical  and  Electronics 
Engineers,  Inc.,  Washington,  D.C.,  September  1982,  Pages  341-348. 

Brian  P.  McCune,  editor,  “AI  at  AI&DS,”  The  Al  Magazine,  Volume  2, 
Number  2,  Summer  1981,  pages  44-47. 

Daniel  G.  Shapiro,  “Sniffer:  A  System  that  Understands  Bugs,” 
MIT/AIM/638,  June  1981. 

Brian  P.  McCune,  “Incremental,  Informal  Program  Acquisition,” 

Proceedings  of  the  First  Annual  National  Conference  on  Artificial 
Intelligence,  Stanford  University,  Stanford,  California,  August  1980, 
pages  71-73. 

Daniel  G.  Shapiro,  “A  Proposal  for  Sniffer,  A  System  that  Understands 
Bugs,”  MIT/AI  Working  Paper  202,  July  1980. 

Cordell  Green,  Richard  P.  Gabriel,  Elaine  Kant,  Beverly  I.  Kedzlerskl, 

Brian  P.  McCune,  Jorge  V.  Phillips,  Steve  T.  Tappel,  and  Stephen  J. 
Westfold,  “Results  In  Knowledge-Based  Program  Synthesis,”  IJCAI-79: 

I Proceedings  of  the  Sixth  International  Joint  Conference  on  Artificial 
Intelligence,  Volume  1,  Computer  Science  Department,  Stanford 
University,  Stanford,  California,  August  1979,  pages  342-344. 
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George  R.  Lewis,  J.  Shirley  Henry,  and  Brian  P.  McCune,  "The  BTI  8000: 
Homogeneous,  General-Purpose  Multiprocessing,"  In  Richard  E.  Merwln, 
editor,  1979  National  Computer  Conference,  AFIPS  Conference 
Proceedings,  Volume  48,  AFIPS  Press,  Montvale,  New  Jersey,  June  1979, 
pages  513-528. 

s\ 

Cordell  Green  and  Brian  P.  McCune,  "Knowledge-Based  Programming 
Applications,”  Applications  of  Image  Understanding  and  Spatial 
Processing  to  Radar  Signals  for  Automatic  Ship  Classification: 

Proceedings  of  a  Workshop,  Naval  Electronic  Systems  Command, 
Washington,  D.C.,  February  1979,  pages  94-99. 

Cordell  Green  and  Brian  P.  McCune,  “Application  of  Knowledge-Based 
Programming  to  Signal  Understanding  Systems,”  Distributed  Sensor 
Nets:  Proceedings  of  a  Workshop,  Computer  Science  Department, 
Carnegle-Mellon  University,  Pittsburgh,  Pennsylvania,  December  1978, 
pages  115-118. 

Brian  P.  McCune,  "The  PSI  Program  Model  Builder:  Synthesis  of  Very 
High-Level  Programs,"  Proceedings  of  the  Symposium  on  Artificial 
Intelligence  and  Programming  Languages,  SIGPLAN  Notices,  Volume  12, 
Number  8,  SIGART  Newsletter,  Number  64,  August  1977,  pages 
130-139. 
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201  San  Antonio  Circle 
fountain  View,  CA  94040 


ABSTRACT 

This  paper  presents  work  m  progress  towards  a 
program  development  and  maintenance  aid  called  the 
Intelligent  Program  Editor  (IPE),  which  applies 
artificial  intelligence  techniques  to  the  task  of 
manipulating  and  analysing  programs.  The  IPE  is  a 
knowledge  based  tool:  it  gains  its  power  by  expli¬ 
citly  representing  textual,  syntactic,  and  many  of 
the  semantic  (meaning  related)  and  pragmatic 
(application  oriented)  structures  in  programs.  To 
demonstrate  this  approach,  we  implement  a  subset  of 
this  knowledge  base,  and  a  search  mechanism  called 
the  Program  Reference  Language  (PRL),  which  is  able 
to  locate  portions  of  programs  based  on  a  descrip¬ 
tion  provided  by  a  user. 


This  research  was  supported  by  the  Air  Force  Office 
of  Scientific  Research  under  contract  F49620-81-C- 
0067,  the  Office  of  Naval  Research  under  contract 
N00014-82-C-01 19,  and  Rome  Air  Development  Center 
under  contract  F30602-80-C-0176. 

1.  INTRODUCTION 


The  effort  and  expense  involved  in  software 
maintenance  have  been  recognixed  as  a  major  limita¬ 
tion  on  the  capabilities  of  current  software  sys¬ 
tems.  In  a  study  on  software  maintenance  issues  in 
the  Air  Force,  we  found  that  the  process  of 
comprehending  the  torm  sad  function  of  existing 
software  (i.e.,  whst  it  does  snd  how  it  does  it)  is 
the  largest  task  in  the  maintenance  process  (2]. 

The  basic  cause  of  this  "comprehension  prob¬ 
lem"  is  the  loss  of  knowledge  during  the  program¬ 
ming  process,  caused  by  factors  such  as  poorly 
written  software,  inadequate  documentation,  pro¬ 
grammer  forgetfulness,  and  personnel  turnover.  To 
addreaa  these  issues,  we  have  started  a  project  to 
develop  intelligent,  knowledge-based  programming 
aids,  designed  to  help  the  progrsmmer  overcome  lim¬ 
itations  of  more  traditional  tools.  This  paper 
describes  the  initial  phase  of  one  of  these  tools, 
an  editor  known  as  the  Intel  1 ieent  Program  Editor 
(IPE).  The  following  sections  discuss  the  motiva¬ 
tion  behind  intelligent  editing,  the  design  of  an 
intelligent  editor,  a  database  for  the  editor,  and 
a  scenario  demonstrating  an  actual  implementation 
of  a  portion  of  the  IPE's  database,  used  in  the 


context  of  u  program  search. 

2.  MOTIVATION 

An  intelligent  editing  system  is  a  sophisti¬ 
cated  tool  for  developing  and  maintaining  programs. 
The  goal,  insofar  as  it  is  possible,  is  to  decrease 
the  amount  of  information  a  programmer  needs  to 
supply  in  order  to  create  and  maintain  a  program, 
and  to  simultaneously  increase  the  reliability  of 
the  resulting  code.  This  can  be  accomplished  by 
incorporating  knowledge  about  the  structure  and 
intention  of  programs  into  the  editing  tools  used 
to  develop  and  maintain  them.  Perhaps  the  best  way 
to  illustrate  this  approach  is  to  present  an 
allegory  having  to  do  with  the  production  of  a 
technical  manuscript. 

Assume  that  there  is  a  manuscript  which  needs 
to  be  typed  for  publication.  If  it  is  given  to  a 
typiat  who  does  not  speak  English,  the  result  would 
be,  at  best,  a  word-for-word  copy  of  the  original 
manuscript.  If  it  is  given  to  an  English-speaking 
typist,  simple  errors,  such  as  misspellings  and 
punctuation  problems,  might  be  fixed  during  the 
typing  process.  If  the  manuscript  is  given  to  an 
English  teacher  moonlighting  a6  a  typist,  the 
result  might  well  be  a  version  in  which  the  prose 
is  smoothed  and  otherwise  improved.  Finally,  if 
one  ia  lucky  enough  to  find  a  typist  familiar  with 
the  domain  of  discourse  (such  as  the  author),  the 
resulting  document  might  even  have  factual  errors 
corrected  and  incomplete  thoughts  identified. 

A  programmer  selecting  an  editor  system  for 
writing  code  is  in  a  similar  situation.  A  standard 
text  editor  is  comparable  to  the  non-English- 
speaking  typist;  text  appears  exactly  a6  it  is 
typed,  with  no  enhancements.  The  English-speaking 
typist  could  be  compared  to  a  syntax-oriented  edi¬ 
tor,  which  can  eliminate  syntactic  program  errors 
and  misspelled  keywords.  The  English 
teacher /typist  knows  about  the  language  itself  but 
not  about  the  content  of  the  thoughts.  This  situa¬ 
tion  ia  comparable  to  a  programming  language- 
specific  editor  which  applies  knowledge  about  the 
domain  of  programming;  this  editor  can  instantiate 
general  programming  techniques,  catch  certain  types 
of  semantic  errors,  make  style  suggestions,  and 
improve  the  overall  flow  of  the  program.  The 
technical  typist  who  understands  the  content  of 
what  is  being  said  u  analogous  to  an  editor  that 
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utilize*  knowledge  ebout  the  application  domain;  it 
can  help  in  algorithm  development  and  can  catch 
certain  type*  of  pragmatic  error*  which  are  depen¬ 
dent  upon  the  (pecific  application  domain. 

3.  THE  INTELLIGENT  PROGRAM  EDITOR 


The  Intelligent  Program  Editor  (1PE)  described 
in  thi*  paper  moit  cloiely  corre*pond*  to  the 
Engliah  teacher/typiat  mentioned  above,  in  that  it 
will  support  textual  and  syntactic  manipulations, 
and  have  the  ability  to  assist  in  the  implementa¬ 
tion  of  typical  programming  actions.  This  power  is 
obtained  through  the  use  of  a  database  that  expli¬ 
citly  represents  the  functional  organization  of 
programs  in  terms  of  textual,  syntactic,  and 
intention-oriented  structures.  With  this  database, 
the  1PE  is  in  a  position  to  become  more  of  a  pro¬ 
gramming  environment  Chan  solely  an  editing  tool. 
In  chi*  vein,  we  are  interested  in  addressing  the 
following  design  goal*  [5]. 

The  IPE  should  provide  a  means  for  naturally 
incorporating  documentation  into  the  program 
development  process.  In  our  view,  this  requires 
the  ability  to  link  documentation  into  the  organi- 
zational  *Ctu££ure  of  a  oroeram  (aimilar  to 
Nelson's  1 3 1  concept  of  Hypertext),  and  Che  ability 
to  actively  use  any  information  chat  is  supplied 
(to  provide  programmer*  with  a  motivation  for 
including  descriptive  data).  In  the  IPE,  documen¬ 
tation  will  provide  input  to  a  program  search 
f acil ity. 

The  system  should  support  incremental  program 
analysis.  The  object  here  it  to  employ  the 
system's  understanding  of  program  structure  to 
catch  syntactic  and  certain  semantic  errors  prior 
to  execution.  Examples  include  identifying  vari¬ 
ables  that  are  accessed  before  being  set  (via  data 
flow  analysis)  and  detecting  programming  cliche* 
that  have  been  incompletely  implemented.  There  it 
alto  a  role  for  error  prevention:  tome  editors 
(e.g.,  (6])  prevent  syntactic  errors  from  ever 
occurring. 

The  IPE  will  allow  the  user  to  employ  alter¬ 
nate  program  visualizations.  This  means  alloving 
the  programmer  to  examine  or  modify  code  through 
any  of  the  representation*  mentioned  above.  For 
example,  a  syntax  bated  approach  might  be  appropri¬ 
ate  during  program  construction,  while  a  graphical 
data  flov  display  may  be  useful  within  the  debug¬ 
ging  process. 

All  of  these  capabilities  require  the  use  of 
multiple  program  representations,  at  veil  as 
mechanisms  for  searching  and  manipulating  the 
information  they  contain.  Therefore,  in  the  first 
phase  of  the  IPE  project,  we  constructed  a  proto¬ 
type  version  of  thi*  program  databaae,  called  the 
Extended  Program  Model  (EPM),  and  demonstrated  it 
in  the  context  of  program  search.  The  remainder  of 
this  paper  ditcutaea  the  EPM  and  the  search  example 
that  was  produced. 


A.  THE  EXTENDED  PROCRAM  MODEL 


The  Extended  Program  Model  (EPM)  provides  a 
new  way  of  repreaenting  and  accessing  programs  by 
defining  a  vocabulary  for  discussing  programs  which 
uses  terms  that  are  much  closer  to  the  ones  which 
users  naturally  employ.  The  EPM  provides  this 
capability  through  the  use  of  a  database  that 
representa  the  structure  of  programs  from  a  variety 
of  views.  The  EPM  can  form  the  backbone  for  a 
number  of  systems  which  exhibit  a  deep  understand¬ 
ing  of  the  organizational  structure  and  meaning  of 
code. 

The  EPM  is  constructed  in  terms  of  two  major 
subsystems  (see  Figure  1)  :  a  program  structures 
database  and  a  search  and  update  component  called 
the  Program  Reference  Language,  which  provides 
access  to  the  database.  In  addition,  the  EPM  will 
contain  a  library  of  “rational  form"  constraints 
that  will  monitor  program  composition  for  its 
structure  and  intentional  content.  As  a  whole,  the 
system  can  be  thought  of  as  a  database  management 
system  for  creating  and  maintaining  code.  It  pro¬ 
vides  a  search  language  for  accessing  its 
knowledge,  a  facilicy  for  performing  updates,  as 
well  as  a  set  of  semantic  integrity  and  consistency 
constraints  for  monitoring  the  validity  of  the  data 
it  contains. 
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Figure  l.  The  Extended  Program  Model 

4.1  THE  PROG  RAM  STRUCTURES  DATA  EASE 

The  EPM's  program  structures  database  is  con¬ 
structed  in  terms  of  a  collection  of  representa¬ 
tions  which  reflect  the  transition  from  a  syntactic 
to  a  more  intention-oriented  analysis  of  code  (Fig¬ 
ure  2).  We  are  considering  these  viewpoints  to  be 
abstract  data  types  which  facilitate  different 
sorts  of  retrieval  ope rations. 
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Figure  2.  Representation  Levels  in  Che  EPM 


The  textual  representation  gives  the  EPH  the 
view  that  Host  text  editors  provide.  It  is  a  low- 
level  approach,  concerned  with  words  and  delim¬ 
iters,  but  it  allows  for  important  textual  search 
operations. 

The  syntactic  viewpoint  embodies  the  rules  of 
grammar  for  particular  programming  languages.  The 
syntactic  database  provides  the  EPM  with  a  vocabu¬ 
lary  for  programming  constructs  such  as  "for" 
loops,  parameters,  and  procedures. 

The  next  level  of  representation  is  the  flow 
level,  which  provides  tteaderd  data  and  control 
flow  information.  It  provides  a  vocabulary  relat¬ 
ing  to  the  logical  structure  of  programs. 

The  segmented  parse  representation  defines  a 
vocabulary  for  a  program  in  terms  of  its  component 
data  and  control  flow.  For  example,  iterations  are 
decomposed  into  a  set  of  roles  which  identify  the 
subfunctions  of  a  loop.  In  the  breakdown  we  are 
using,  loops  contain  generators,  filters,  termina¬ 
tors,  and  augmentations  (7].  Generators  are  seg¬ 
ments  which  produce  a  sequence  of  values.  They  can 
be  further  refined  into  initialiaations  and  a  body, 
which  is  the  portion  that  it  executed  many  times. 
Filters  restrict  Chat  sequence  of  values.  A  termi¬ 
nator  is  like  a  filter,  except  that  it  has  the 
additional  potential  to  stop  execution  of  the  loop. 
An  augmentation  consumes  values  and  produces 
results.  There  are  other  vocabulary  elements  for 
describing  straight  line  code. 

The  taxonomy  discussed  up  to  this  point  pri¬ 
marily  captures  information  about  the  form  of  pro¬ 
grams  (as  opposed  to  their  meaning).  The  only 
semantic  elements  we  have  introduced  describe  the 
substructure  of  built-in  entities  such  as  loops. 
The  next  (more  abstract)  viewpoint  considers  pro¬ 
grams  to  be  built  of  objects  with  stereotyped  pur¬ 
poses.  These  are  called  typical  programming  pat¬ 
terns  (TPPs).  Examples  of  TPPs  include  variable 
interchanges,  list  insertions,  and  hash  table 
abstractions.  These  abstractions  are  the  tools 
employed  by  every  expert  programmer.  Rich  has 


defined  a  library  of  such  TPPs  (4)  (he  uses  the 
term  cliche;  in  this  paper,  we  use  both  terms 
interchangeably ). 

The  remaining  databases  (intentional  aggre¬ 
gates  and  documentation)  provide  methods  for  asso¬ 
ciating  the  intentions  behind  a  program  with 
specific  features  of  code.  They  capture  pragmatic 
knowledge  relating  to  Che  domain  of  application  of 
the  program.  Intentional  aggregates  are  a  type  of 
formal  documentation  that  allow  the  association  of 
larger  program  fragments  with  key  concepts  (sup¬ 
plied  by  the  user).  They  can  be  used  to  collect  a 
set  of  TPPs  and  other  program  segments  that  imple¬ 
ment  a  single  conceptual  function;  for  example,  a 
collection  of  TPPs  representing  queue  operations 
might  be  grouped  (by  the  user)  into  an  intentional 
aggregate  representing  a  scheduler. 

The  documentation  database  allows  the  user  to 
associate  comments  with  any  of  the  program  features 
already  described.  At  the  lowest  (i.e.,  textual) 
level,  this  would  take  the  form  of  in-line  com¬ 
ments.  At  other  representational  levels,  the  user 
could,  for  example,  document  the  data  flow  in  a 
particular  module  (saying  why  an  input-output  rela¬ 
tionship  occurs),  justify  his  use  of  particular 
TPPs,  or  explain  why  particular  syntactic  features 
are  employed.  The  advantage  of  this  technique  over 
current  documentation  practice  is  the  ability  to 
make  a  direct  association  (via  links  maintained  by 
the  1PE)  between  the  documentation  and  what  it 
talks  about,  at  an  appropriate  conceptual  level. 

4.2  KNOWLEDGE  ACQUISITION 


Since  the  EPM's  database  is  intended  to  sup¬ 
port  an  actual  editing  system  in  the  near  future, 
it  is  important  to  address  the  question  of  where 
its  information  is  obtained.  In  our  approach,  the 
different  knowledge  sources  are  acquired  in  part 
from  the  user,  and  in  part  by  automatic  means. 
Specifically,  the  syntactic  representation  can  be 
obtained  directly  from  the  textual  representation, 
and  the  segmented  parse  viewpoint  can  be  con¬ 
structed  through  data  flow  analysis  techniques  of 
the  kind  developed  by  Waters  17). 

The  TPP  structures  are  harder  to  obtain. 
Recent  research  efforts  indicate  that  general 
recognition  of  cliches  may  be  possible  [1),  but  at 
the  current  time,  these  techniques  have  not  actu¬ 
ally  been  demonstrated.  The  EPM  will  use  manual 
recognition  techniques  (at  least  until  automatic 
recognition  techniques  have  been  refined).  There 
are  two  manual  recognition  techniques  planned  for 
the  system.  In  the  first,  the  user  points  to  a 
piece  of  code  and  identifies  it  as  being  a  particu¬ 
lar  TPP  (as  a  way  of  documenting  the  system);  at 
this  point,  once  the  scope  has  been  narrowed  down, 
it  may  be  possible  to  identify  the  subcomponents  of 
these  programming  cliches  automatically.  In  the 
second  method,  the  user  uses  TPPs  for  program  gen¬ 
eration  (at  in  18));  by  instantiating  a  TPP  and 
"filling  in  the  blanks,"  the  EPM  can  acquire  all 
the  necessary  information. 


The  ''tcntional  aggregate  and  documentation 
views  muat  be  wholly  obtained  from  the  uaer.  At  a 
■inimtaa,  Che  EPM's  planned  consistency  mechanisms 
will  identify  any  of  thia  information  that  may  be 
out  of  date  due  to  modif icationa  to  the  code. 

5.  TBS  PK5CRAM  REFERENCE  LAIC  DACE 


In  order  to  deaonetrate  the  feasibility  of  the 
EPM,  we  implemented  a  portion  of  the  database 
described  above,  and  built  a  version  of  the  EPH'a 
search  facility,  the  Program  Reference  Language 
(PEL)  which  operates  on  that  data.  The  PRL  is  a 
tool  for  locating  regions  of  program  text  based 
upon  a  description  provided  by  the  user.  As  a  sup¬ 
port  system,  it  provides  programmers  with  an 
intention-oriented  vocabulary  for  specifying  por¬ 
tions  of  programs  in  situations  where  they  may  be 
unfamiliar  with  the  detailed  structure  of  the  code. 
This  might  occur  in  the  process  of  editing  programs 
which  may  be  too  large  to  remember  explicitly,  or 
in  the  act  of  understanding  code  which  hat  rarely 
been  teen  before  Cat  is  often  the  case  in  mainte¬ 
nance). 

The  PRL  demonstration  system  allows  program 
search  based  on  four  of  the  representations 
described  above,  namely  the  textual,  syntactic, 
segmented  parse  and  typical  programming  pattern 
views  (Figure  3).  These  databases  are  connected 
through  a  code  region  abstraction  that  associates 
program  features  with  physical  sections  of  program 
text. 


Figure  3.  The  Program  Reference 
Language  Implementation 

The  PRL  has  a  flat  information  structure.  It 
represents  each  database  in  terms  of  a  complex  tree 
or  graph  structure  of  frames.  Although  the  system 
can  arbitrarily  convert  between  viewpoints  by  using 
code  regions  as  an  intermediary,  the  databases  have 
no  direct  links  between  one  another.  These  conver¬ 
sions  are  inherently  heuristic  since  the  separate 


representations  do  not  necessarily  have  a  one-to- 
one  correspondence.  The  information  in  each  data¬ 
base  is  either  automatically  derived,  or  can  be 
reasonably  obtained  from  the  user.  In  situations 
where  the  latter  is  necessary,  we  have  assumed  that 
information  may  be  provided  in  an  incomplete  form. 

3.1  CODE  PAINTING 

From  a  computational  point  of  view,  the  main 
problem  involved  with  this  multiple  representation 
approach  is  to  define  a  mechanism  that  is  able  to 
compare  information  obtained  from  the  different 
sources  of  knowledge.  The  PRL  accomplishes  this 
via  the  code  region  abstraction,  which  functions  as 
a  common  language  that  each  of  the  representations 
can  use  to  communicate. 

Code  regions  support  two  different  approaches 
to  search.  In  the  first  method,  which  we  call 
sequential  filtering,  the  user  makes  a  gross  stab 
at  selecting  a  code  region  by  generating  all  of  the 
elements  which  satisfy  some  fairly  general  condi¬ 
tion.  Ue  then  sequentially  restricts  that  set  by 
applying  siore  and  more  conditions.  For  example,  to 
find  "the  loop  which  computes  the  sum  of  the  test 
scores",  he  locates  the  set  of  all  loops,  and  then 
restricts  it  to  the  ones  which  involve  test  scores 
and  summations. 

In  the  second  approach,  the  user  identifies  a 
collection  of  items,  possibly  from  several  dif¬ 
ferent  databases,  and  intersects  them  together  to 
find  the  elements  which  satisfy  all  of  the  condi¬ 
tions  he  wants  to  impose.  In  this  “code  painting" 
approach,  the  PRL  combines  these  items  essentially 
by  overlaying  the  corresponding  regions  of  code. 
For  example,  locating  "the  loops  which  compute 
sums"  is  done  (figuratively)  by  coloring  all  loops 
red  and  all  places  that  compute  sums  yellow.  Any 
region  which  comes  up  orange  has  all  of  the  proper¬ 
ties  that  were  desired. 

Code  painting  is  a  deliberately  coarse  affair. 
It  is  designed  to  exploit  the  kind  of  incomplete  or 
even  slightly  inaccurate  information  which  the  EPK 
will  contain,  given  that  much  of  the  data  is  pro¬ 
vided  by  the  user.  In  some  cases,  code  painting 
may  not  identify  the  exact  section  of  the  program 
which  the  user  desired,  but  in  the  context  of  an 
interactive  system  with  a  screen  oriented  display, 
“close"  will  be  good  enough.  To  help  the  user  see 
the  effects  of  code  painting,  it  is  possible  to 
highlight  the  identified  section(s). 


5.2  A  SCENARIO  USIfC  THE  PRL 

The  following  example  shows  how  the  PRL  uses 
the  code  painting  paradigm  to  answer  the  question 
"find  the  ini t ial i ra t ions  of  the  loop  which  com¬ 
putes  the  sum  of  the  test  scores",  given  the  Ada 
program  shown  in  Figure  d. 
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for  KAXSIZE  in  1..L0  loop 

TOTAL :-  ARRAYSUM  (TEST-SCORES,  MAXSIZE) ; 
put  (TOTAL); 
end  loop; 


function  ARRAYSUM  (A:  in  ARRAY;  N:  in  INTEC ER)  return  INTBCER  it 
begin 

SUM:  REAL:-  0; 
for  1  in  1..N  loop 
SUM:-  SUM  ♦  A(I); 
end  loop; 
return  SUM; 
end  ARRAYSUM; 


Figure  4.  The  Ada  Program  Uaed  in  the  Scenario. 


In  this  example,  the  user  atarta  by  identify¬ 
ing  three  aeta  of  data,  correaponding  to  the  summa¬ 
tion  TPPt,  syntactic  loops,  and  segmented  parse 
frames  involving  the  test  score  array. 

>  (index  'summation  tpp-database) 

->  TPPsetl 

>  (index  'loops  syntax-database) 

->  LOOPsetl  .'(length  2] 

>  (index  'TEST-SCORES  segp-database) 

->  SEC  set  1 :  [length  6] 


The  program  only  contains  one  TPP,  but  there 
are  two  loops,  and  several  segments  which  relate  to 
the  variable  TEST-SCORES.  It  is  important  to 
notice  that  all  of  these  segments  use  the  data  con¬ 
tained  in  the  variable  TEST-SCORES  but  do  not 
necessarily  refer  to  it  by  that  name  (for  example, 
the  literal  "A(I)"  in  the  ARRAYSUM  function 
accesses  the  test  score  array).  This  association 
is  apparent  from  the  data  flow  analysis  within  the 
segmented  parse. 

The  user  intersects  these  descriptions  by 
invoking  the  code  painting  paradigm.  The  code¬ 
painting  algorithm  returns  Che  largest  region  of 
text  which  can  be  described  in  all  three  ways. 

>  (overlay-code-regions  TPPsetl  LOOPsetl  SEEsetl) 

->  CQDE-RBGI0N1 

**for  I  in  1..N  loop 

SUM:-  SUM  ♦  A ( I ) ; 
end  loop;** 

In  order  to  compute  this  information,  the 
overlay  function  automatically  converts  the  input 
sets  into  their  corresponding  regions  of  code. 
Most  of  these  translations  are  automatically  avail¬ 
able  (chough  heuristic  in  nature).  In  the  case  of 
the  TPP,  the  user  had  to  define  that  mapping  at 
some  time. 

At  this  point,  the  user  has  identified  s  loop 
which  computes  the  sum  of  the  test  scores.  In 
order  to  find  the  in  it ial  its t ion s  of  this  code,  he 


views  this  region  from  the  segmented  parse  perspec¬ 
tive  (where  initializations  are  represented  expli¬ 
citly),  and  scans  it  for  segments  of  the  appropri¬ 
ate  type.  This  is  a  filtering  operation,  in  which 
the  user  applies  restrictions  to  a  previously  iden¬ 
tified  set  of  objects. 


>  (Filter  (Segs-Uithin  CODE-RBGION1 ) 

'(Seg-Type  "initialization")) 
->  SEGset2:( length  2] 


The  PRL  converts  CODE-REG IONl  to  a  set  of  seg¬ 
mented  parse  frames  (a  heuristic  process),  and  the 
function  Segs-Uithin  enumerates  th,  subsegments  it 
contains.  The  system  identifies  wo  initializa¬ 
tions  as  a  result.  The  user  prints  them  by  con¬ 
verting  them  to  the  textual  view. 


>  (show!  S  EG  set  2 ) 

->  for  I  in  **1..N**  loop 
->  **SUM:  REAL :-  0;** 


The  answers  correspond  to  the  initializations 
of  the  iteration  variable  "1",  and  the  accumulation 
variable,  "SUM".  Note  that  the  PRL  retrieves  the 
second  initialization,  even  though  it  is  lexically 
outside  of  the  summation  loop  itself.  It  is  iden¬ 
tified  from  the  segmented  parse  analysis,  which 
associates  a  loop  and  its  initializations  no  matter 
how  far  apart  they  might  have  been  in  the  original 
code . 


6.  CURRENT  STATUS  AND  FUTURE  WORK 


AIADS  is  now  developing  a  prototype  version  ol 
the  1PE  (in  a  three  year,  2-3  person  effort),  which 
is  intended  to  demonstrate  the  efficacy  of  our 
knowledge  based  approach  to  the  design  of  program¬ 
ming  support  tools.  The  prototype  will  embody  a 
portion  of  all  of  the  facilities  that  have  been 
described.  The  IPE  is  currently  targeted  for  the 
Ada  language.  It  will  initially  run  on  a  Symbolics 
3600,  a  fast,  personal  LISP  computer  that  features 
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•  high-resolution  bit-map  display,  but  it  is  being 
designed  to  be  portable  to  other  systems  (in  par¬ 
ticular,  Unix). 

We  expect  to  augaent  the  EPM's  database  to 
include  more  pragaatic  inforaation  (e.g.,  the  rela¬ 
tion  between  requirements  and  prograa  structures), 
and  we  intend  to  extend  the  PRL  to  the  point  where 
it  will  be  able  to  autoaatically  plan  and  carry  out 
aearch  requests  of  the  kind  demonstrated  in  thia 
paper  (based  on  a  single  user  query).  When  these 
extensions  are  complete,  the  PEL  will  define  a  more 
formal  reference  language. 

The  cask  of  building  a  prototype  for  the  IPE 
involves  a  number  of  issues  including  the  incremen¬ 
tal  modification  of  databases,  and  the  recognition 
of  user  intentions  in  code.  In  order  to  solve 
these  problems  in  the  context  of  our  applied 
research,  we  expect  to  rely  heavily  on  methods  for 
eliciting  information  from  the  user,  and  to  focus 
on  template-oriented  techniques  for  smnipulating 
programs. 


We  would  like  to  thank  Hichael  Brxuetovicx  and  Eric 
Domeshek  for  their  contributions  to  this  project. 
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This  appendix  contains  a  reprint  of  the  paper  "An  Informal  Study  of 
Software  Maintenance  Problems,”  by  Jeffrey  S.  Dean  and  Brian  P.  McCune. 


(Software  Maintenance  Workshop,  December  1983.] 
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ABSTRACT 


overview  or  air  force  kites 


A  ctudy  of  aoftvare  maintenance  problems  vac 
performed  aa  the  firat  atep  of  a  project  aimed  at 
suggesting  advanced  or  novel  techniques  to  increase 
reliability  and  reduce  coata  during  the  maintenance 
proceaa.  Thia  paper  summarises  come  of  the  reaulta 
of  tbe  study. 


INTRODUCTION 


To  gain  a  better  understanding  of  the  problems 
encountered  in  large  software  maintenance 
environments,  ve  studied  tbe  maintenance  efforta  at 
several  Air  Force  C3I  aoftvare  organisations.  The 
study  consisted  of  one  or  more  days  of  interviewing 
key  personnel  at  each  of  the  sites,  followed  by 
questionnaires  being  sent  to  these  sites. 

Characteristics  of  the  three  Air  Force  sites  were 
collected  during  the  interviews,  and  are  summarised 
below. 


In  an  effort  aimed  at  finding  long  term  solutions 
to  the  groviog  software  maintenance  problem,  AIADS 
conducted  a  two  year  software  maintenance  study  for 
the  Air  Force  til.  The  primary  goal  of  this  effort 
vss  to  identify  advanced  tools  and  techniques  (vith 
particular  emphasis  on  artificisl  intelligence 
techniques)  capable  of  significantly  impacting  the 
software  maintenance  process  vithin  Che  nest 
decade.  The  project  was  divided  into  three  major 
phases:  (1)  studying  the  software  maintenance 
process  and  identifying  the  major  problems;  (2) 
identifying  tools  and  techniques;  and  (3) 
evaluating  these  tools  and  techniques.  This  paper 
summarises  our  findings  from  the  firat  phase  of  the 
the  project. 

WHAT  IS  MAINTENANCE! 


For  the  purposes  of  this  study,  ve  used  an 
’'inclusive"  definition  of  maintenance: 

Software  ms tntenance  is  all  those  activities 
associated  with  a  aoftvare  system  after  the 
system  has  been  initially  defined,  developed, 
deployed,  and  accepted  as  operational. 

Maintenance  is  primarily  a  reactive  activity:  it  is 
performed  in  response  to  requests  (primarily 
requests  for  modification  of  software),  rather  than 
on  the  basis  of  some  regular  schedule. 


This  work  vac  supported  by  Rome  Air  Development 
Center  under  contract  F30602-80-C-01 76 . 


Site  1: 

application:  aatellite  tracking  and  control 
aoftvare:  integrated  system,  coded  in  Jovial 

J4 

site:  1  million  lines 

hardware:  network  of  small,  medium,  and  large 

machines 

developer:  outside  contractors 

maintainer:  ten  different  contractors 

process:  batch  processing,  core  patching 

SitiJLi 

application:  communications 

software:  numerous  systems,  generally  coded 

in  assembly  language 

‘A**:  systems  range  in  sire  from  25,000 

to  560,000  lines  of  code 

hardware:  variety  of  computers 

developer:  outside  contractors 

maintainer:  in-house 

process:  maintenance  generally  done  in  batch 

processing  node 


application: 

software: 

site: 
hardware : 
deve  toper : 


wide  variety,  from  data  processing 
to  strategic  planning 
numerous  systems,  coded  in  a 
variety  of  languages 
24  million  lines  of  code 
vide  range  of  computers 
systems  developed  by  Outside 
contractors 
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maintainer:  moatly  io-bouae,  with  aome  outaide 

cootractora 

proem :  outdated  toola 


aoftware  evolutioa  (i.e.,  refining,  an  compared  to 
repairing)  ia  a  aignificant  part  of  the  maintenance 
pbaae. 


TU  SOFTWARE  KAUfTEXANCK  SDiVTT 


Software  Maintenance  Aetivitiea 


After  the  interviewa,  we  rent  out  an  informal 
aurvey  to  peraonoel  at  theae  aitea.  The  purpoae  of 
the  aurvey  vaa  to  gather  core  information  about  the 
■aiotenance  aetivitiea,  to  provide  background  and 
motivation  for  later  phaeee  of  the  project.  No 
attempt  waa  r»de  to  do  aa  thorough  or  aa 
atatiatical ly  aophiaticated  an  approach  aa  other 
atudiea  (auch  aa  [2]). 


Ve  divided  aoftware  maintenance  into  a  nunber  of 
aetivitiea,  and  aaked  reapondente  to  rate  the 
iaportance  of  each  activity  on  a  acale  from  0  to  10 
(with  10  aignifying  "extreme  amounta  of  time  apent 
on  tbia  teak").  The  averaged  reapooxea,  in 
decreaeing  order,  were  aa  folloue: 

TASK  IMPORTANCE 


The  aurvey  waa  divided  into  three  pacta: 

1.  Keaaona:  ''Why  ia  aoftware  modified!" 

2.  Aetivitiea:  "Where  ia  tioie  ape  t  during 
aaintenance!" 

3.  Problema:  "Why  ia  aiaiotenance  ao  difficult!" 


Keaaona  for  Softvare  Modification 


We  divided  modif icacion  requeata  into  four 
categoriea : 

1.  Correcting:  "There  waa  aomething  wrong  with 
the  aoftware." 

2.  Adapting:  "Something  the  ayatem  depended  upon 
haa  changed." 

3.  Perfecting;  "We  wanted  to  fine-tune  the 
ayatem." 

A.  Modifying :  '“We  didn't  like  the  ayatem  the  way 
it  waa." 

Theae  categoriea  are  aimilar  to  thoae  in  the  Lientz 
and  Swanaon  atudy  (2],  with  the  addition  of  one 
more  category  (modifying).  Keapoodeota  were  aaked 
Co  eatimate  the  percentage  of  requeata  that  fell 
into  each  category.  The  averaged  reaponaea,  in 
deacending  order,  were  at  followa: 


teating  6.3 
coding  6.3 
training  of  new  peraonoel  and  uaera  4.8 
monitoring,  problem  detection,  diagnoaia  4.7 
deaign  4.4 
documentation  3.9 
nanagement  3.6 
configuration  control  3.4 
analyaia  and  apecif ication  of  requirementa  2.9 


It  ia  intereating  to  note  that  note  time  vaa  apent 
on  lover  level  tanka  (auch  aa  teating  and  coding) 
than  on  higher  level  taaka  (auch  aa  apecif ication 
and  deaign).  Unfortunately,  our  aurvey  did  not 
probe  aufficiently  to  determine  the  reaaona  behind 
thia  diatribution  of  effort;  ve  cannot  tell  if 
higher  level  taaka  were  neglected,  or  if  lover 
level  taaka  were  juat  inherently  more  time 
conauming.  If  higher  level  taaka  are  indeed  being 
neglected,  thia  would  moat  likely  have  a  negative 
impact  on  the  overall  maintainability  of  aoftware. 


Software  Maintenance  Problema 


The  laat  aection  of  the  aurvey  identified  four 
major  aoftvare  maintenance  problema  that  were 
identified  during  interviewa.  Keapondenta  were 
aaked  to  rate  the  importance  of  each  problem  on  a 
acale  from  0  to  10  (with  10  aignifying  "extremely 
important  problem").  Theae  averaged  reaponaea,  in 
deacending  order  of  importance,  were  aa  follova: 

PROBLEM  IMPORTANCE 


REQUEST 

modifying 

correcting 

perfecting 

adapting 


PERCENTAGE 

462 

311 

152 

82 


Requeatt  in  the  modifying  category  alone  account 
for  almoat  half  of  the  requeata.  Together  with  the 
perfecting  category  (the  other  category  for 
"refinement"  type  requeata),  they  account  for  over 
602  of  the  requeatt  (aimilar  to  (21).  Softvare 
maintenance  haa  often  been  thought  of  aa  repairing 
aoftware.  However,  theae  numbera  indicate  that 


high  turnover  of  peraonoel  8.7 
underatanding  aoftware/ 

lack  of  good  documentation  7.6 
determining  relevant  placet  to  make  changea  6.9 
mooitoring  and  diagnoaiog  operationa  6.3 


The  peraonoel  turnover  problem  in  the  Air  Force  ia 
the  reault  of  ao  average  two  year  rotation  cycle 
that  cauaea  a  continuing,  regular  turnover. 
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TUX  COKPREHENSIOS  PROBLEM 


The  Cop  tnrce  maintenance  problem*  all  appear  to 
revolve  around  a  lack  of  understanding  of  the 
roftvare  and  of  the  maintenance  environment.  We 
call  this  the  comprebenaion  problem.  The  relation 
of  comprehanaion/uuderstanding  to  theaa  problem*  i* 
clear: 

-  High  turnover  of  peraoanel:  Experienced 

personnel  are  replaced  with  nev  personnel  who 
are  unfamiliar  vith  the  applicationa  software, 
and  may  be  unfamiliar  with  Che  programing 
environment  (tools,  operating  procedures,  etc.) 
a*  well.  The  turnover  rate  is  so  high  that 
there  is  little  time  allocated  to  update  the 
documentation  adequately. 

'  Difficulty  in  understanding  aof tware/lack  of 
good  documntation:  Software  to  be  maintained 
is  hard  to  understand,  particularly  in  the 
absence  of  current,  high  quality  documentation. 

'  Determining  all  relevant  places  to  make 
changes:  Programmers  have  a  hard  time  knoving 
where  to  make  changes  because  they  do  not 
understand  well  enough  how  the  code  works. 
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ADVANCED  TOOLS  TO  K EDUCE  THE  COMPREHENSION  PROBLEM 


During  the  last  tvo  phases  of  this  project,  we 
identified  nine  tools/techniques  for  improving  the 
sa rncenaoce  process,  and  evaluated  these  ideas  by 
another  set  of  surveys  [1].  The  highest  ranked 
tools  addreaa  the  problem  of  comprehension  by 
explicitly  collecting  information  about  programs, 
docuneotatioo,  and/or  the  progressing  process,  and 
helping  programmers  apply  that  information  on  a 
regular  baais. 


CONCLUSIONS 


The  results  of  the  survey  shed  light  on  three 
imporCsat  issues  in  the  meinteaence  process. 
First,  most  of  the  requests  for  maintenance  are 
requeacs  for  refinement,  retber  than  request!  for 
repair.  This  reinforces  the  idea  that  maintenance 
is  primarily  a  process  of  evolution.  Second,  moat 
of  the  time  spent  is  spent  on  lov-level  taakt,  such 
as  testing  and  coding.  Finally,  most  of  the 
difficulty  in  the  maintenance  procees  appears  to 
arise  from  e  lack  of  understanding  of  the 
application  software,  as  well  as  the  maintenance 
env  iromnent . 
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Appendix  C 


APPENDIX  C. 


This  appendix  contains  a  reprint  of  the  paper  "Trends  for  Advanced 
Software  Tools,"  by  Brian  P.  McCune  and  Jeffrey  S.  Dean. 


(Invited  paper,  EASCON  *83,  September  1983.] 
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ABSTRACT 

A  recently  completed  atudy  determined  the 
major  problema  in  the  maintenance  of  Air  force  com¬ 
mand,  control,  communications,  and  intelligence 
aoftvarc  and  propoeed  a  number  of  advanced  aoftvare 
toola  to  deal  with  tbeae  problem*,  hoe.t  of  tbeae 
advanced  tool*  will  rely  on  knowledge-based  tech¬ 
nique*  from  the  field  of  artificial  intelligence 
(Al).  During  the  courae  of  chi*  reaearch,  a  number 
of  general  trend*  were  noted  in  the  characteristic* 
of  cheae  and  other  aoftvare  toola,  including  both 
Al  and  oon-AI  tool*.  Among  theae  trend*  are  the 
uae  of  knowledge  of  and  reaaoniog  about  the  dtaain 
of  application,  the  performance  of  cool  activitie* 
in  email,  incremental  atep*  to  provide  better  feed¬ 
back  to  the  prograoaier,  Che  increaaiog  intelligence 
of  uacr  interface*  Co  aoftvare  tool*,  and  the 
maintenance  and  uae  of  a  global  knowledge  baae 
including  a  hiatory  of  what  ha*  been  done  before 
and  vby.  Tbi*  paper  di«cu*«e*  cheae  and  other 
trend*  for  advanced  aoftvare  tool*. 


1 .  imODOCTIO* 


The  effort  and  expense  of  maintaining  aoftvare 
have  been  recognited  a*  major  limitation*  oo  the 
capabilitie*  of  current  aoftvare  ayeteme.  The  dif- 
ficultiea  ariae  for  aeverat  reaaon*.  Fir*t, 
although  hardware  coat*  have  decreaaed,  aoftvare 
eapeoaea  have  akyrocketed  due  to  the  higher  coat  of 
profeaaional  progr earner a .  Second,  a*  aoftvare  pro¬ 
ject*  have  become  more  and  more  ambitioua,  the 
technical  difficulty  of  making  change*  to  the 
reaulting  program*  haa  iocreaaed  dramatically.  A* 
an  illustration  of  thi*  fact,  the  maintenance  coat* 
for  large  ayateai*  typically  aurpaaa  the  fund* 
required  for  their  initial  development;  the  Depart¬ 
ment  of  Defcnae  now  apend*  more  than  three  billion 
dollar*  per  year  oo  aoftvare  maintenance.  Theae 
problem*  are  addreaaed  in  part  by  the  creation  of 
atandardiaed  atructured  language*  *uch  a*  Ada,  but 
in  our  opioion  they  will  only  be  aolved  by  the 


Thi*  research  vaa  *upported  in  part  by  Rome  Air 
Development  Center  under  contract  FJ0401-80-C-01 76 
and  by  the  Air  Force  Office  of  Scientific  Reaearch 
under  contract  r6%  20-81 -C-0067 . 


reaulta  of  nev  reaearch  into  automated  programing 
•upport  system* .  We  expect  that  many  auch  tool* 
will  rely  on  the  application  of  artificial  intelli¬ 
gence  (Al)  technique*. 

To  gain  better  insight  into  the  specific  prob¬ 
lema  of  aoftvare  maintenance,  ve  performed  a  study 
which  analysed  aoftvare  maintenance  problema  in  the 
Air  Force  (Dean  A  McCune-82).  The  atudy  concluded 
chat  the  process  of  comprehending  the  form  and 
function  of  existing  aoftvare  (i.e.,  what  it  doe* 
and  hov  it  does  it)  i*  the  moat  crucial  atep  in  the 
maintenance  procea*.  A  number  of  toola  were 
defined,  each  of  which  could  provide  a  limited 
operational  capability  in  the  ahort  term  (i.e., 
lea*  than  three  year*)  and  then  gradually  be 
enhanced  in  the  medium  term  (i.e.,  three  to  seven 
year*)  and  beyond. 

Thi*  "comprehension  problem"  i*  revealed  in 
many  way*.  To  begin  vith,  moat  programing  instal¬ 
lation*  have  a  high  turnover  rate  of  personnel  and 
have  trouble  finding  qualified  replacements.  A*  a 
result,  maintenance  personnel  are  often  unfamiliar 
vith  the  programs  that  are  being  maintained.  At 
the  name  tine,  documentation  is  often  unavailable 
or  of  poor  quality.  Thi*  increase*  the  difficulty 
of  comprehending  a  given  program.  It  i*  not  easy 
to  understand  a  program  by  directly  reading  Che 
code  because  of  the  quantity  of  detail  involved  and 
also  because  coding  standard*  are  poorly  enforced 
and  rarely  agreed  upon.  Finally,  the  procea*  of 
isolating  bugs,  designing  solution*,  and  determin¬ 
ing  the  ramification*  of  change*  i*  difficult  in 
the  presence  of  an  incomplete  understanding  of  the 
program's  organisation.  The  relative  difficulty  of 
this  task  is  affected  by  the  tool*  available  to  the 
programmer. 

The  software  maintenance  study  identified  a 
collection  of  tool*  designed  to  alleviate  these 
problems,  all  of  which  rely  on  a  sophisticated 
understanding  of  the  structure  of  program*.  In 
effect,  they  operate  by  transferring  some  of  the 
expertise  currently  in  the  minds  of  programmer* 
into  a  machine-usable  form  that  can  be  shared. 
Three  of  the  most  relevant  tool  idea*  are  auaxnar- 
ited  below.  Advanced  Information  A  Decisioo  Sys¬ 
tems  is  actively  working  on  all  three  of  these 
tools  . 

-  The  Protrammins  Manager  (PM)  assist*  a  program¬ 
mer  by  systematically  applying  administrative 


and  technical  policies.  It  enforces  some  pro¬ 
cedures  (e.g.,  testing  of  code  before  installa¬ 
tion),  suggests  others  (e.g.,  notifying  a  user 
group  of  a  change),  and  au toma t i ca 1 ly  performs 
some  actions  on  its  own.  In  order  to  perform 
these  functions,  PM  has  a  model  of  the  underly¬ 
ing  environment  and  each  tool  in  the  environ¬ 
ment,  including  calling  options  and  expected 
output.  The  Programming  Manager  is  also 
intended  to  capture  heuristic  knowledge  about 
code,  for  example,  that  bugs  in  module  A  often 
cause  runtime  errors  in  module  B. 

-  The  Intel  1  lgent  Program  Editor  (1PE)  is  a 
knowledge-based  tool  for  supporting  the 
development  and  maintenance  of  software 
(Shapiro  &  McCune-830] .  It  embodies  a  deep 
understanding  of  the  structure  of  programs,  of 
techniques  for  searching  for  relevant  parts  of 
programs  baaed  upon  complex  queries  [Shapiro  & 
McCune-83Aj,  and  of  the  manipulations  that  pro¬ 
grammers  typically  apply  to  code.  It  can  pro¬ 
vide  access  to  a  variety  of  other  tools  that 
deal  with  code,  e.g.,  the  Documentation  Assis¬ 
tant  described  below. 

*  The  Documental  ion  Assistant  is  a  system  that 
helps  obtain,  organize,  access,  and  maintain 
many  different  forms  of  documentation,  ranging 
from  l tae-by- line  comments  to  design  principles 
sad  application-oriented  requirements  that 
underly  the  structure  of  the  code  as  a  whole. 
The  Documentation  Assistant  is  intenced  to  pro¬ 
vide  knowledge  that  other  systems  (such  as  the 
IPE)  can  employ. 


2.  THENDS 


In  surveying  existing  production  and 
research-prototype  tools,  as  well  as  in  our  own 
research  efforts,  some  particularly  important 
trends  and  techniques  have  surfaced.  These  trends 
represent  paradigms  for  the  entire  programming  pro¬ 
cess,  capable  of  forming  the  basis  of  s  new  genera¬ 
tion  of  programming  tools.  Other  than  that,  these 
trends  are  fairly  dissimilar,  varying  in  scope  from 
the  very  broad  to  the  fairly  specific. 

The  remainder  of  this  paper  discusses  these 
general  trends  that  we  see  occurring  now  and  into 
the  future  of  software  tool  development.  Defini¬ 
tions  of  specific  tools  that  embody  many  of  these 
trends  are  presented  m  I  Dean  &  HcCune-82l;  three 
of  these  were  mentioned  above.  We  confine  our¬ 
selves  to  a  discussion  of  trends  for  tools,  rather 
than  underlying  programming  or  related  languages. 
W»*  also  assume  that  for  at  least  the  next  decade 
programming  environments  ar  1  the  programming  pro¬ 
cess  will  evolve  from  the  current  state-of-the-art. 
Thus,  we  du  not  speculate  on  the  potential  of  radi¬ 
cal  or  revolutionary  alternatives  to  programming, 
sot h  as  automatic  programming  [Elschlager  & 
1’h  i  l  1  ips-82  ]  ,  in  which  a  single  monolithic  tool 
hides  all  processing  details  from  the  user. 


We  discuss  nine  important  trends  in  program¬ 
ming  tools,  programming  environments,  and  their 
use.  These  trends  are 

1.  Advanced  capabilities 

2.  Domain  knowledge  and  reasoning 

3.  Ability  to  be  tailored 

4.  Life-cycle  coverage 

5.  Tool  integration 

6.  Advanced  user  interface 

7.  Integrated  database 

8.  Incrementalism 

9.  Distribution 


2.1  ADVANCED  CAPABILITIES 

An  obvious  trend  in  software  tools  is  that 
toward  more  advanced  capabilities.  This  arises  in 
part  from  the  continuing  drop  in  hardware  prices 
and  increase  in  the  demand  and  price  of  skilled 
programmers.  It  makes  economic  sense  to  automate 
more  and  more  of  the  clerical  programming  functions 
as  additional  cpu  cycles  become  cheaper  than  addi¬ 
tional  hours  of  human  labor. 

Probably  more  important  in  the  long  run  than 
cost  trade-offs  of  hardware  versus  people  are  the 
great  technical  advances  that  are  on  the  horizon. 
Advances  in  a  number  of  areas  are  going  to  have  an 
important  impact  in  the  next  decade.  Among  these 
technical  areas  are: 

-  Art  if ic ial  intel 1 igence .  Artificial  intelli¬ 
gence  (AI)  is  the  science  and  art  of  automating 
problem-solving  processes  that  are  informal, 
heuristic,  and  symbolic  in  nature.  The  sim¬ 
plest  definition  of  Al  is  any  activity  that  is 
performed  by  a  non-human  entity  (typically  a 
digital  computer)  and  that  is  usually  con¬ 
sidered  to  require  intelligence  when  performed 
by  humans.  At  the  core  of  AI  are  two  notions: 
the  complex  manipulation  of  symbols  (as  opposed 
to  numbers  or  text),  and  the  use  of  heuristics 
("rules  of  thumb")  that  can  guide  one  quickly 
to  a  likely  or  satisficing  solution.  Al  sys¬ 
tems  usually  perform  complex  inferencmg  that 
involves  combining  the  use  of  a  number  of 
heuristics  in  an  appropriate  fashion  to  solve  a 
problem.  Much  of  the  research  in  applying  AI 
to  programming  has  concentrated  on  fully 

automating  the  process,  except  the  specifica¬ 
tion  stage.  AI  techniques  such  as  heuristic 
reasoning,  learning,  na tura 1  - language  under¬ 
standing,  and  representation  of  domain 

knowledge  may  prove  very  useful  when  applied  to 
today's  programming  environments.  (Al  is  now 
being  applied  to  numerous  other  defense  prob¬ 
lems,  such  as  ocean  surveillance  iDrazovich, 
McCune ,  4  Payne-82)  and  ship  classification 


iMcCune  6  Dr  azov  ich-83  )  . ) 

Very  h uth- leve  1  lan>>uaxes  .  A  very  high-level 
language  ( VHLL)  is  a  programming  language  that 
provide*  capabilities  significantly  beyond  the 
capabilities  offered  by  traditional  high-level 
languages.  The  level  of  a  language  refers  to 
its  similarity  or  closeness  to  machine 
language.  Assembly  language  is  a  low-level 
language;  it  maps  directly  into  machine 
language  and  requires  the  programmer  to  be  fam¬ 
iliar  with  the  basic  operations  of  the  target 
machine.  Languages  like  FORTRAN  and  PASCAL  are 
considered  high-level  programming  languages 
(HLLs);  they  provide  the  programmer  with  a  com¬ 
putational  model  that  is  somewhat  higher  than 
machine  level  (e.g.,  by  allowing  the  programmer 
to  talk  about  variables  and  loops,  instead  of 
memory  locations  and  jumps).  Languages  such  as 
APL  and  LISP  are  considered  even  higher  level, 
tailing  somewhere  between  HLLs  and  VHLLa ;  they 
allow  the  programmer  to  talk  about  arrays, 
lists,  and  the  composition  of  operators. 
Experimental  VHLLs  exist  chat  provide  represen¬ 
tation  of  sets  and  mathematical  operations  on 
them  (e.g.,  SETL  1  Kennedy  6  Schwar tz-7 b ] )  ,  or 
objects  and  operations  for  specific  application 
areas.  VHLL6  remain  a  research  topic  because 
the  process  ot  translating  a  VHLL  program  into 
an  efficient  program  is  difficult.  VHLLs  can 
still  be  effectively  employed,  by  virtue  of 
their  ability  to  reduce  manpower  cost*. 

Program  transformation.  Program  transformation 
lb  the  conversion  of  a  program  into  another, 
computat  lonal  ly  "similar"  program,  where  the 
degree  of  similarity  ranges  from  analogous  to 
equivalent.  Transformations  may  be  done  for  a 
variety  of  reasons.  If  a  program  library  con¬ 
tains  a  routine  similar  to  what  the  programmer 
needs,  it  may  be  possible  to  automatically 
transform  that  routine  into  the  desired  one;  if 
a  program  is  written  in  a  nonprocedural  specif¬ 
ication  language,  it  may  be  necessary  to 
transform  the  program  into  a  more  procedural 
form  before  it  can  be  translated  into  some  real 
prcgramming  language,  if  the  program  is  written 
using  inherently  inefficient  constructs, 
transformation  can  convert  those  constructs 
into  more  efficient  ones.  Taking  the  idea  of 
t ranaf ormation  one  siep  further,  the  entire 
programming  process  can  be  thought  of  as  a 
series  of  program  transformations  or  refine¬ 
ments,  gomg  from  a  high-level  specification  to 
the  actual  code. 

Form a J  ver  if  i c a t  ion .  Formal  verification  is  the 
demonstration  that  a  piece  ot  program  is  con¬ 
sistent  with  a  given  specification.  This 
demon s l r a l 1 un  is  carried  out  as  a  proof  within 
(he  framework  of  a  formal  tua  t  hem  a  t  i  c  a  1  system 
that  in  most  cast's  i  j>  based  on  first-order 
predicate  logic.  The  specification  formally 
describes  desired  properties  of  the  program,  it 
may  give  a  complete  specification  of  functional 
behavior  (relationship  between  input  and  output 
values)  or  a  specification  ot  certain  aspects, 
like  absence  of  particular  runtime  errora, 
security  of  data  flows,  termination,  ot  bounds 


on  running  time.  Formal  program  verification 
is  one  form  of  program  validation.  It  differs 
from  others  by  requiring  rigorous  and  formal 
spec  if icst iona  ae  well  as  the  capability  for 
reasoning  about  programs,  and  in  turn  provides 
a  much  higher  degree  of  assurance  that  a  pro¬ 
gram  indeed  performs  as  specified. 

“  Symbolic  execution.  Symbolic  execution  means 
evaluation  of  a  program  with  symbolic  values 
instead  of  actual  data.  Symbolic  execution 
creates  symbolic  expressions  that  represent  the 
values  of  outputs  as  a  function  of  input  vari¬ 
ables,  and  (symbolic)  predicates  ("path  condi¬ 
tions")  that  characterize  the  subset  of  values 
that  cause  the  program  to  execute  a  particular 
program  path.  Symbolic  evaluation  thus  shows 
the  dependencies  between  the  values  of  dif¬ 
ferent  variables  and  between  data  and  control 
flows.  Symbolic  execution  provides  a  versatile 
and  powerful  tool  for  debugging  and  analyzing 
programs.  In  comparison  with  ordinary  testing, 
one  symbolic  execution  of  a  program  may 
correspond  to  a  potentially  large  (even  infin¬ 
ite)  number  of  normal  test  runs.  Symbolic  exe¬ 
cution  may  be  considered  a  weak  form  of  program 
verification;  it  shares  some  of  the  problems  of 
verification  systems. 

-  Graphics  and  other  advanced  input -out  put  . 
Graphics  and  other  forms  of  advanced  input- 
output  (I/O)  are  valuable  in  improving  the  user 
interface.  For  comprehension  of  complex  infor¬ 
mation,  graphical  displays  excel  at  helping 
users  reach  their  potential.  At  worst,  they 
can  be  used  to  mimic  the  linear  textual  output 
of  hardcopy  terminals;  more  appropriately,  they 
can  be  used  to  display  drawings  and  schematics 
as  well  as  dynamic  ("moving")  pictures.  Termi¬ 
nals  with  full-page,  h igh-re*o lut ion  displays 
are  now  availcble  (e.g.,  Xerox  STAR,  Apple 
LISA,  Symbolics  3600  LISP  workstation).  These 
allow  the  use  of  screen  pages  that  may  be  as 
large  as  actual  hardcopy  pages;  additional 
software  provides  the  capability  for  stacking 
or  overlapping  these  "windows". 

-  Sof  t ware  metrics.  Measurements  of  performance 
are  necessary  to  judge  both  programmers  and 
software.  Used  appropriately,  this  data  can  be 
used  to  objectively  improve  the  software  pro¬ 
cess.  For  example,  performance  statistics  for 
a  programmer  can  be  useful  in  determining 
appropriate  training  courses;  statistics  about 
the  quality  of  a  program  can  be  used  to  help 
decide  if  the  program  ahould  be  modified  or 
rewritten.  Typical  software  metrics  provide 
quantitative  measures  of  progtam  complexity. 
An  example  metriL  is  the  degree  ol  interconnect 
tivity  of  a  set  of  modules  as  determined  by  an 
analysis  of  their  data  and  control  flow  graphs. 
These  measures  can  be  used  to  predict  estimated 
development  or  maintenance  effort,  to  guide  the 
development  and  maintenance  process,  or  to 
predict  the  reliability  (lack  ot  errors)  of  a 
program . 

-  Computer-assisted  instruction.  The  field  of 
computer-assisted  instruction  ((A  I)  has  been 


attacking  subject*  such  as  logic  and  foreign 
languages,  as  well  as  more  elementary  topics, 
tor  some  time.  A  small  amount  of  work  has  been 
done  on  teaching  particular  programming 
languages  and,  to  some  extent,  programming 
techniques.  To  speed  up  the  learning  cycle, 
the  CAl  system  usually  has  access  to  the  pro¬ 
gramming  tools  for  the  appropriate  language,  in 
order  to  compile  and  run  programs  and  automati¬ 
cally  grade  performance  by  examining  their  out¬ 
put  . 


2.2  DOMAIN  KNOWLEDGE  AND  REASONING 

By  doma in  we  refer  to  an  area  of  expertise, 
such  aa  programming  or  a  particular  application 
area.  Knowledge  of  and  reasoning  about  a  specific 
doma  in  can  be  quite  useful  m  a  programming  support 
environment.  This  is  nicely  illustrated  with  an 
example  from  an  analogous  situation.  Suppose  you 
had  a  technical  manuscript  that  needed  to  be  typed. 
It  you  gave  the  manuscript  to  a  typist  who  spoke  no 
English,  you  would  expect,  at  best,  a  word-for-word 
typewritten  copy  of  the  manuscript.  If  you  gave  it 
to  an  Eng 1 ish- speak ing  typist,  you  would  hope  that 
simple  errors,  such  as  misspellings  and  punctuation 
errors  ,  would  be  fixed.  If  you  gave  it  to  an 
English  teacher  moonlighting  as  a  typist,  you 
wouldn't  be  surprised  to  find  that  some  of  your 
prose  had  been  improved  upon.  And  if  you  were 
lu«.ky  enough  to  find  u  typist  familiar  with  the 
domain  of  discourse  (of  the  manuscript),  you 
shouldn't  be  surprised  to  find  factual  errors 
cor  reeled. 

The  problem  of  getting  the  manuscript  typed 
with  the  best  possible  result  is  similar  to  the 
problem  of  writing  a  program.  You  select  some  type 
ot  editor  to  use  in  entering  the  program  text.  A 
standard  text  editor  would  be  comparable  to  the 
non-Eng l ish-a peak ing  typist:  text  is  entered 
exactly  as  typed,  with  no  enhancements.  The 
English-speaking  typist  could  be  compared  to  a 
syntax-oriented  editor,  which  can  eliminate  syntac¬ 
tic  program  errors  and  misspelled  keywords  (e.g., 
CANUALF's  editor,  MENTOR,  Cornell  Program  Syn¬ 
thesizer).  The  other  two  typists  have  a  fair 
degree  of  knowledge  and  understand  how  to  apply  it. 
The  English  teache r / t yp is t  knowa  about  the  language 
itself  (rather  than  the  content  ot  what  is  being 
said).  This  situation  is  comparable  to  a  program¬ 
ming  language-specific  editor,  which  applies 
knowledge  about  the  domain  ot  programming;  the  edi- 
toi  l a n  help  with  general  programming  techniques, 
'an  catch  certain  types  ot  semantic  errors,  can 
make  style  suggestions,  and  can  improve  the  general 
flow  ot  the  program.  The  technical  typist  who 
understands  the  content  ot  what  ih  being  said  is 
analogous  to  an  editor  that  utilizes  knowledge 
afnuil  the  application  domain.  it  can  help  with 
dniia  in-apec  1 1  l<  techniques,  such  as  algorithm 
development,  and  can  catch  certain  kinds  of  prag¬ 
ma  tu  errors  that  are  dependent  upon  the  specific 
application  domain. 

.*»o ,  in  a  programming  support  environment,  it 
n  desirable  to  have  lw<  types  ot  expertise. 


programming  expertise  and  application  expertise. 
An  "ultimate1*  goal  might  be  to  endow  the  system 
with  expertise  equaling  that  of  a  human;  the  system 
would  exhibit  programming  expertise  comparable  to 
that  of  a  computer  scientist  and  application  exper¬ 
tise  similar  to  that  of  a  domain  specialist.  Note 
that  in  a  programming  support  environment,  the 
latter  type  of  knowledge  is  more  specialized,  hence 
less  widely  applicable  (a  new  knowledge  base  is 
needed  for  each  application  area). 

The  use  of  domain  knowledge  and  reasoning  in 
the  programming  environment  will  drastically  change 
the  whole  concept  of  programming.  It  will  allow 
the  software  tools  to  truly  help  the  programmer, 
freeing  the  programmer  to  concentrate  on  higher- 
level  issues. 

2.3  ABILITY  TO  BK  TAILORED 

Future  tools  will  have  the  ability  to  be 
highly  tailored  to  suit  the  needs  of  the  particular 
situation,  including  management  hierarchy,  applica¬ 
tion  domain,  other  tools  available  in  the  program¬ 
ming  environment,  and  idiosyncrasies  of  the  tool 
users.  Obviously,  this  level  of  variability  goes 
well  beyond  the  simple  parairet  er  izat  ion  or  runtime 
options  found  in  many  tools  today.  Modeling  large 
bodies  of  facts  and  preferences  requires  knowledge 
representation  techniques  from  AI.  Modifying  these 
bodies  requires  an  ability  to  elicit  new  or  modi¬ 
fied  knowledge  from  users  or  to  learn  by  observa¬ 
tion.  These  areas  are  among  the  most  difficult  in 
AI  research,  but  the  potential  is  tremendous. 

2.4  LIFE-CYCLE  COVERAGE 

Future  environments  will  have  capabilities 
that  support  more  of  the  software  life-cycle.  Most 
important,  maintenance  of  software  after  initial 
release  is  recognized  as  typically  requiring  two- 
thirds  of  the  overall  lifetime  costs  of  a  software 
system.  Therefore  tools  must  be  designed  with 
maintenance,  as  well  as  development,  in  mind.  Some 
tools  may  be  developed  solely  for  use  in  the 
maintenance  phase. 

The  use  of  tools  to  date  has  been  concentrated 
in  the  coding  and  testing  phases  of  software 
development.  There  is  an  obvious  reason  for  this: 
source  and  executable  code  and  data  are  often  the 
only  forms  of  information  stored  in  the  computer 
and  therefore  available  to  tools.  This  situation 
is  slowly  chsnging,  as  other  forms  of  information 
are  formalized  and  automated,  ranging  from  require¬ 
ments  and  design  a  pec  i  f  l  ca 1 1 on s  ,  to  formal  documen¬ 
tation  and  test  datu  specifications,  to  management 
schedules  and  methodology  descriptions,  to  measure¬ 
ments  gained  by  applying  sot t ware  metrics.  For 
each  new  type  of  information,  tools  are  needed  to 
assist  in  its  creation,  analysis,  and  transforma¬ 
tion  into  other  types  of  information. 

Because  so  much  more  information  than  _juet 
code  will  be  dealt  with  by  tools,  many  future  tools 
will  be  independent  of  a  specific  programming 
1  anguage . 


2.5  TOOL  INTEGRATION 

There  is  a  saying  that  "the  whole  is  more  than 
the  sum  of  its  parts'*.  This  notion  of  syne  rgy  ia 
important  in  the  design  of  software  tools.  When 
several  tools  work  together,  they  may  provide  some¬ 
thing  that  neither  one  could  alone  provide.  The 
term  int egrat  ton  is  used  here  to  refer  to  the 
degree  of  synergy  and  close  coupling  between  tools. 
Tools  in  a  we  1 1- integra t ed  system  exhibit  a  large 
degree  of  synergy  (as  a  result  of  working  well 
together).  Since  synergy  results  from  interdepen¬ 
dencies,  integrated  tools  are  likely  to  share 
information,  share  common  procedures,  or  provide 
complementary  functions.  Systems  such  as  INTERLISP 
[Teitelman  6  Masinter-811  and  UNIX  iKernighan  6 
Mashey-81 I  owe  a  large  amount  of  their  success  to 
their  integration. 

We  1 1- integrated  systems  provide  several  impor- 
t  ant  advantages : 

-  Human  comprehension  is  aided  by  the  uni  form  it  y 
provided  by  a  we  1 1- int egrat ed  system.  A  con¬ 
sistent  underlying  philosophy  aids  users  in 
making  inferences  about  how  the  system  works. 

-  An  integrated  tool  set  allows  one  to  put  tools 
together  quickly  in  order  to  perform  tasks  that 
may  not  have  even  been  envisioned  by  the  system 
designers.  This  benefit  is  well  known  to  UNIX 
users . 

-  Integrated  tools  work  together,  allowing  more 
efficient  and  effective  performance.  Effi¬ 
ciency  i6  gained  when  one  tool  can  make  use  of 
another's  work,  eliminating  redundant  computa¬ 
tions;  for  example,  symbol  tables  created  by  a 
compiler  can  be  used  by  debuggers,  linkers, 
cross-reference  listers,  etc.  Effectiveness  is 
increased  when  tools  can  make  use  of  each  oth¬ 
ers'  information;  for  example,  a  compiler  may 
be  able  to  apply  optimizing  code  generation 
strategies  by  getting  information  from  a  pro¬ 
gram  verifier  that  the  compiler  cannot  deduce 
by  purely  syntactic  means. 

Related  to  integration  is  the  idea  of  com¬ 
pleteness.  Completenea  s  means  that  the  user  should 
he  able  to  do  everything  that  might  be  needed.  The 
beauty  of  an  integrated  system  is  marred  when  a 
user  has  to  expend  a  large  amount  of  energy  to  do 
something  that  is  conceptually  simple  but  that 
i mu  t  allowed  by  the  system.  For  example,  the 
IN'lEKl.lSP  system  a  l  Iowa  certain  common  monitor- 

*  »*  v  1  commands  to  be  performed  without  leaving  the 
svhttm,  To  perform  other  commands,  there  is  a  sim- 
, I  intei face  ihat  creates  a  new  process  running 

*  lie  operating  systtiD  s  command  processor,  allowing 
i  to-  .jhi[  to  execute  arbitrary  cuoununda  and  then 
return  to  INTEKLISP  without  any  loas  of  continuity. 

Utbigumg  a  ions  latent  yet  usable  ayalem 
requires  a  great  deal  of  ingenuity  and  insight  on 
Lhe  part  of  the  designers.  Hut  the  effort  does  pay 
"ft.  consider  the  popularity  of  UNIX.  The  Stone- 
man  requirements  for  Ada  Program  Support  Environ¬ 
ments  (APSEs)  also  specify  an  integrated  tool  set  , 


the  success  of  this  requirement  will  be  seen  as 
APSEa  are  implemented  and  used. 

2.6  ADVANCED  USER  INTERFACE 

Future  tools  will  have  very  advanced  inter¬ 
faces  to  programmers  and  other  users.  The  mo6t 
visible  part  of  the  interface  is  the  collection  of 
1/0  techniques  available.  Input  techniques  may 
range  from  selection  of  commands  from  menus,  point¬ 
ing  using  a  nouse  or  other  cursor  positioning  dev¬ 
ice,  natural-language  input,  and  speech  input. 
Output  techniques  include  high-resolution  graphics 
that  are  capable  of  displaying  publication-quality 
program  listings,  the  use  of  color  to  aid  in  focus¬ 
ing  attention,  and  speech  output. 

Despite  the  current  interest  in  advanced  I/O 
techniques,  the  use  of  these  techniques  alone  can¬ 
not  solve  software  development  and  maintenance 
problems.  The  primary  difficulty  is  in  deciding 
what  information  to  communicate  to  the  programmer, 
rather  than  how  to  communicate  it.  A  tool  that 
uses  some  combination  of  advanced  1/0  techniques 
can  be  changed  to  work  with  simpler  methods,  usu¬ 
ally  without  a  significant  loss  in  information  or 
f unct  iona l ity . 

Hany  tools  converse  with  the  programmer 
interactively.  To  be  used  effectively,  it  is 
necessary  for  the  user  to  understand  what  the  tool 
is  aaying  and  how  to  respond  to  it.  On-line  help 
facilities  can  teach  generic  command  structures, 
but  tools  must  also  be  able  to  explain  the  details 
of  the  current  situation.  Knowledge-based  Al  sys¬ 
tems  are  generally  capable  of  explaining  their 
current  state  and  what  chain  of  reasoning  got  them 
there  . 

Going  even  further,  advanced  user  interfaces 
should  provide  some  of  the  intelligence  and  assis¬ 
tance  that  a  human  programming  assistant  might  pro¬ 
vide.  An  intelligent  user  interface  not  only  makes 
life  easier  for  the  programmer;  it  helps  increase 
programmer  productivity  and  software  reliability. 
Here  are  acme  of  the  kinds  of  features  that  an 
intelligent  user  interface  might  provide: 

-  P r ogr aroma bi 1  it v .  The  user  interface  in  a  pro¬ 
gramming  support  environment  should  provide  the 
programmer  with  tools  (or  automating  his  own 
tasks,  either  by  the  programmer  explicitly  pro¬ 
gramming  the  tasks  or  by  the  system  learning. 

-  Er  ror  prevent  ion .  By  making  "bad"  things  hard 
to  do,  it  ia  less  likely  that  they  will  bo  done 
inadvertently.  Warnings  about  dangerous 
actions,  before  they  are  performed,  fuither 
reduce  the  chance  of  it  ror. 

-  Error  detect  ion  and  correction.  It  is  not  too 

difficult  to  catch  many  types  of  errors 
automatically.  Every  attcanpt  should  be  made  to 
catch  error*  as  early  as  possible;  the  later  an 
error  is  detected,  the  more  expensive  it  is  to 
fix.  Error  diagnostics  should  be  meaningful  to 
the  user,  not  only  to  the  person  who  wrote 
t hem .  Some  errors,  especially  silly,  careless 


ones  (e.g.,  spelling  errors),  can  be  corrected 
without  too  much  difficulty. 

-  Re  covera  b 1 1  it  y .  If  an  error  is  made,  the  user 
should  be  able  to  recover  as  easily  as  possi¬ 
ble.  The  system  should  have  safeguards  to  pro¬ 
vide  the  user  with  certain  paths  of  recourse, 
e.g.,  by  allowing  actions  (such  as  deleting  a 
file)  to  be  undone.  "Forgiveness"  is  impor¬ 
tant.  Raking  a  bLunder  is  bad  enough;  one 
should  not  have  to  spend  hours  or  days  to  right 
it  . 

-  Act ive  help.  If  the  user  repeatedly  does  things 
incorrectly,  there  may  be  no  need  to  wait  until 
help  is  requested.  In  many  cases,  the  user  may 
not  be  aware  that  help  exists,  or  may  not  know 
how  to  ask  for  it.  Help  should  be  offered 
automat ical Ly . 

-  Non- interact ive  operation.  The  system  should  be 
able  to  function  without  human  intervention  if 
necessary.  If  a  programmer  leaves  the  terminal 
while  performing  a  task,  there  is  oftentimes  no 
need  to  bring  things  to  a  halt  when  only  non- 
crucial  human  input  is  needed. 

In  order  to  accomplish  these  goals,  an 
advanced  user  interface  must  have  models  of  both 
the  uaer  and  the  process  by  which  tools  are  used. 
It  is  necessary  to  understand  the  programmer's 
actions  (what  he  is  doing)  and  intentions  (what  he 
will  be  or  wants  to  be  doing).  For  example,  an 
editor  incorporating  programming  domain  knowledge 
needs  to  know  what  parts  of  a  program  the  program¬ 
mer  will  be  writing  or  changing,  as  well  as  the 
(expected)  effect  this  will  have  on  other  parts  of 
the  system.  An  editor  incorporating  application 
dooiain  knowledge  needs  to  know  what  techniques  the 
programmer  will  be  utilizing,  as  well  aa  the  type 
of  output  and  results  expected.  As  soon  as  an 
environment  has  a  model  of  what  tools  are  available 
and  how  to  access  them,  it  ls  feasible  to  construct 
comprehen e 1 ve  or  meta-t  ools .  tools  that  reason 
about  and  invoke  other  tools  on  behalf  of  the  user. 

To  help  the  programmer  make  decisions  about 
wtuit.  to  do,  tools  need  to  understand  the  program¬ 
ming  process  itself  u»  order  to  determine  what  ‘.he 
pr  ugr  .inane  r  is  doing  right  or  wrong.  Wh  .  a 
spe.  1!  ic  methodology  is  chosen  for  an  env 1 ronwon. , 
software  tools  should  be  provided  to  aid  each  step 
of  the  methodology. 

2.7  I  WTKCILATKD  DATABASE 

intormation  in  most  programming  environments 
is  stored  as  a  set  of  individual  files  of  various 
types.  This  is  essentially  juat  a  classical  file 
system  as  provided  by  most  operating  systems.  The 
next  generation  of  environments  wlII  probably  use 
something  closer  to  a  relational  model  of  data,  so 
that  uniform  random  access  is  possible  to  all 
objects  and  so  that  complex  relational  objects  such 
as  structured  documentation  can  be  easily  stored 
and  accessed.  As  more  Ai-baaed  tools  are  incor¬ 
porated  into  an  environment,  the  database  will 


evolve  into  a  full-fledged  knowledge  base  that 
incorporates  all  of  the  previous  information  plus 
complex  semantic  models  and  sets  of  heuristics. 

The  history  list  or  audit  trail  vs  cne  data¬ 
base  component  that  is  recognized  as  important  by 
moat  state-of-the-art  environments  (e.g.,  INTEK- 
LISP,  APSEs).  The  notion  of  computational  history 
refers  to  the  information  available  during  the 
course  of  some  computation.  For  example,  when 
using  a  text  editor,  the  history  includes  the  edit¬ 
ing  commands  as  well  as  the  inserted  and/or  deleted 
text;  when  using  a  compiler,  (he  history  includes 
the  original  source  code,  the  parse  trees,  parse 
tree  transformations,  and  generated  code.  Some  of 
this  information  has  no  long-term  value  beyond 
immediate  consumption  by  a  program;  but  much  of  the 
information  is  quite  valuable,  either  because  it  is 
expensive  to  recompute  (e.g.,  parse  trees  for  a 
large  module)  or  because  it  cannot  be  recomputed 
(e.g.,  a  record  of  all  operations  performed  by  the 
user) . 

There  are  numerous  reasons  why  history  is  a 
necessary  ingredient  in  advanced  programming 
environments.  First  of  all,  sophisticated  program¬ 
ming  environments  must  allow  programmers  to  make 
changes  incrementally,  so  that  the  cost  of  making 
small  changes  ia  small.  To  accomplish  this,  inter¬ 
mediate  results  of  various  system  tools  and  utili¬ 
ties  (e.g.,  compilers,  linkers)  must  be  kept 
around.  Another  need  16  accountability:  records 
of  all  important  activities  should  be  maintained  so 
it  can  always  be  determined  what  has  been  done  and 
who  has  done  it.  Important  activities  include 
things  like  changes  to  code,  document  updates,  sys¬ 
tem  builds,  etc.  From  the  perspective  of  the  user 
interface,  preservation  of  a  history  is  also  desir¬ 
able.  Some  programming  systems,  such  as  INTEHJLLSP, 
allow  the  user  to  see  a  record  of  what  ha6  been 
done  and  allow  transactions  to  be  "replayed". 
Finally,  history  ia  necessary  for  the  application 
of  programming  domain  knowledge  and  reasoning:  to 
understand  what  the  programmer  i6  doing,  it  is 
necessary  to  understand  the  context  in  which  the 
programmer  has  been  working. 

2.8  INCREMENTALISM 

Support  of  incremental  change  is  vital  for  the 
maintenance  of  all  but  the  smallest  systems.  It  is 
unacceptable  and  unnecessary  to  require  a  whole 
system  to  be  rebuilt  each  time  a  small  change  is 
made:  unacceptable  because  the  cost  is  too  high, 
unnecessary  because  changes  usually  leave  many 
partB  of  the  system  unaffected. 

The  move  toward  building  systems  that  handle 
incremental  change  has  been  slow,  primarily  since 
it  is  (in  general)  more  difficult  to  build  tools 
that  are  incremental.  There  are  several  problems. 
First  of  all,  new  algorithms  may  have  to  be  devised 
or  old  "batch"  algorithms  modified,  m  order  to 
handle  incremental  requests.  Another  problem  is 
lack  of  information:  most  tools  throw  out  informa¬ 
tion  as  soon  as  they  are  done  with  it,  rather  than 
leaving  it  around  for  future  reference.  An  example 
of  this  is  symbol  table  information,  which  the 


compiler  builds  op  for  each  module  and  then  usually 
discards.  This  means  that  the  symbol  table  must  be 
rebuilt  for  each  recompilation,  even  if  code 
changes  had  po  effect  on  it. 

Incrementalism  is  a  technique  vital  for  the 
development  and  maintenance  of  large  systems,  yet 
tew  enisling  programming  tools  make  use  of  it. 
Abide  from  some  research  on  incremental  techniques 
lor  syntactic  parsing,  most  attempts  at  incorporat¬ 
ing  incrementalism  have  been  somewhat  ad  hoc  (and 
less  than  generally  applicaole).  The  idea  of 
incrementalism  falls  out  naturally  when  some  of  the 
other  techniques  discussed  earlier  in  this  section 
(e.g.,  hi6tory)  are  incorporated  into  the  program¬ 
ming  environment. 

2.9  DISTRIBUTION 

A 6  more  of  the  non-coding  functions  of  the 
software  life-cycle  are  automated,  it  becomes 
clearer  that  the  model  of  a  single  programmer 
interacting  with  a  unique  copy  of  the  environment 
and  database  is  not  adequate.  Large  programming 
projects  have  numerous  individuals  operating  asyn¬ 
chronously.  These  personnel  may  have  different 
functions  (e.g.,  supervisor,  designer,  coder,  tes¬ 
ter,  documenter) ,  different  physical  locations, 
different  "home"  computers,  etc.  Thus,  progr amm ing 
environments  are  faat  entering  the  era  of  distri¬ 
buted  systems  and  processing,  with  all  of  the  stan¬ 
dard  problems  of  planning  and  coordination,  syn¬ 
chronization  of  computer  objects  and  events, 
maintenance  of  multiple  copies  of  objects,  etc. 

A  number  of  architectures  are  possible  for 
distributed  programming  environments.  The  simplest 
has  each  programmer  accessing  the  primary  develop¬ 
ment  computer  (probably  a  mainframe)  via  a  front- 
end  computer  (probably  an  advanced  personal  com¬ 
puter).  A  few  of  the  more  interactive  tools  (e.g., 
editor,  language  interpreter)  would  run  on  the 
front-end,  while  moat  would  remain  on  the  mainframe 
(e.g.,  optimizing  compiler,  database  handler).  A 
moi  e  distributed  architecture  would  have  copies  of 
each  tool  at  each  node,  with  no  node  being  central. 
Finally,  individual  tools  may  also  be  distributed 
across  processors  someday. 


3.  CONCLUSION 


We  have  presented  nine  trends  that  cut  across 
many  possible  programming  tools  and  support 
env  i  r  otunen  t  s  .  Each  trend  will  eventually  require 
the  use  of  A 1  to  achieve  its  potential.  In  this 
role,  A I  is  not  a  radical,  high-risk  approach  to 
f he  software  problem,  but  a  technology  that  will 
make  possible  major  enhancements  to  every  aspect  of 
the  programming  paradigm  of  today. 
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